当前位置:首页 > 学术交流 打印页面】【关闭
“数字+”与统计数据工程系列讲座(八十六)12月18日宾夕法尼亚州立大学马彦源教授来我院做讲座预告
( 来源:   发布日期:2024-12-16 阅读:次)

题目:Doubly Flexible Estimation under Label Shift

报告人:马彦源

报告时间:2024年12月18日(周三)  15:15-16:00

地点: 综合楼644会议室

报告人简介:

马彦源,现为宾夕法尼亚州立大学统计系教授,北京大学学士学位,麻省理工学院博士学位。其主要研究兴趣包括:降维、测量误差模型、潜在变量模型、混合样本、非参数、半参数、生存分析等。已公开发表论文160余篇,其中有40余篇发表在国际统计学和计量经济学顶级期刊如JRSSB、AoS、JASA、Biometrika和JoE。曾担任国际统计学顶级期刊JRSSB、JASA、Biometrics的副主编。

报告摘要:

In studies ranging from clinical medicine to policy research, complete data are usually available from a population P, but the quantity of interest is often sought for a related but different population Q which only has partial data. In this paper, we consider the setting that both outcome Y and covariate X are available from P whereas only X is available from Q, under the so-called label shift assumption, i.e., the conditional distribution of X given Y remains the same across the two populations. To estimate the parameter of interest in population Q via leveraging the information from population P, the following three ingredients are essential: (a) the common conditional distribution of X given Y, (b) the regression model of Y given X in population P, and (c) the density ratio of the outcome Y between the two populations. We propose an estimation procedure that only needs some standard nonparametric regression technique to approximate the conditional expectations with respect to (a), while by no means needs an estimate or model for (b) or (c); i.e., doubly flexible to the possible model misspecifications of both (b) and (c). This is conceptually different from the well-known doubly robust estimation in that, double robustness allows at most one model to be misspecified whereas our proposal here can allow both (b) and (c) to be misspecified. This is of particular interest in our setting because estimating (c) is difficult, if not impossible, by virtue of the absence of the Y -data in population Q. Furthermore, even though the estimation of (b) is sometimes on -the-shelf, it can face curse of dimensionality or computational challenges. We develop the large sample theory for the proposed estimator, and examine its finite-sample performance through simulation studies as well as an application to the MIMIC-III database.

 




上一条: 没有了
下一条: 没有了