题目:Distribution-free prediction bands for clustered data with missing responses

报告人: 唐炎林




唐炎林,华东师范大学统计学院教授,博士生导师,统计学系主任;入选国家高层次青年人才计划(组织部)。20121月博士毕业于复旦大学统计系,同年5月加入同济大学,20191月加入华东师范大学。主要研究方向为分位数回归、高维统计推断、不完全数据统计建模,主持多项国家自然科学基金、上海市自然科学基金,担任SCI期刊Statistica SinicaJournal of the Korean Statistical Society的编委。在BiometrikaJRSSBBiometrics等发表论文30余篇。


Existing methods for missing clustered data often rely on strong model assumptions and are therefore prone to model misspecification. We construct prediction bands for the whole trajectories of new subjects based on the conformal inference, yielding covariate-dependent prediction bands with coverage guarantees in finite samples, without making any assumptions about model specification and within-cluster dependency structure. We first reduce the clustered data into independent cross-sectional data by subsampling, then propose three weighted conformal methods to produce prediction regions. To make use of the correlation information of the clustered data, we repeat the subsampling and conformal inference, to produce an integrated prediction region by combining the dependent p-values. Among the three proposed methods, the weighted CD-split method yields the smallest prediction region by converging to the highest density set, and provides asymptotic conditional coverage guarantees for each given subject. Simulations show that our methods have excellent finite-sample behavior under different complex error distributions compared to other alternatives. The practical use is demonstrated in the motivating serum cholesterol data and CD4+ cell data sets.

