讲座题目:IndividualizedMulti-directional Variable Selection
主讲人:Professor Annie Qu 加州大学欧文分校
讲座时间:2020年8月10日(周一)上午10:00―12:00
参与方式:会议 ID:292 419 730
会议直播: https://meeting.tencent.com/l/UfOqiINQycWe
主讲人简介:
Chancellor’s Professor,Department of Statistics, University of California Irvine Ph.D, Statistics, thePennsylvania State University.
Qu’s research focus onsolving fundamental issues regarding unstructured large-scale data, developingcutting-edge statistical methods and theory in machine learning and algorithmson text sentiment analysis, automatic tagging and summarization, recommendersystems, tensor imaging data and network data analyses for complex heterogeneousdata, and achieving the extraction of essential information from large volumehigh-dimensional data. Her research has impacts in many different fields suchas biomedical studies, genomic research, public health research, and social andpolitical sciences.
Before she joins the UCIrvine, Dr. Qu is Data Science Founder Professor of Statistics, and theDirector of the Illinois Statistics Office at the University of Illinois atUrbana-Champaign. She was awarded as Brad and Karen Smith Professorial Scholarby the College of LAS at UIUC, a recipient of the NSF Career award in2004-2009, and is a Fellow of the Institute of Mathematical Statistics and aFellow of the American Statistical Association.
讲座摘要:
In this paper we propose aheterogeneous modeling framework which achieves individualwise featureselection and heterogeneous covariates’ effects subgrouping simultaneously.
In contrast toconventional model selection approaches, the new approach constructs aseparation penalty with multi-directional shrinkages, which facilitatesindividualized modeling to distinguish strong signals from noisy ones andselects different relevant variables for different individuals.
Meanwhile, the proposedmodel identifies subgroups among which individuals share similar covariates’effects, and thus improves individualized estimation efficiency and featureselection accuracy.
Moreover, the proposedmodel also incorporates within-individual correlation for longitudinal data togain extra efficiency. We provide a general theoretical foundation under adouble-divergence modeling framework where the number of individuals and thenumber of individual-wise measurements can both diverge, which enablesinference on both an individual level and a population level.
In particular, weestablish a strong oracle property for the individualized estimator to ensureits optimal large sample property under various conditions.
An efficient ADMM algorithmis developed for computational scalability. Simulation studies and applicationsto post-trauma mental disorder analysis with genetic variation and an HIVlongitudinal treatment study are illustrated to compare the new approach to existingmethods. This is joint work with Xiwei Tang and Fei Xue.