Statistics Seminar

Su Mo Tu We Th Fr Sa
26 27 28 29 30 31 1
2 3 4 5 6 7 8
9 10 11 12 13 14 15
16 17 18 19 20 21 22
23 24 25 26 27 28 1
Date/Time:Wednesday, 12 Feb 2014 from 4:10 pm to 5:00 pm
Location:Snedecor 3105
Cost:Free
URL:www.stat.iastate.edu
Contact:Jeanette La Grange
Phone:515-294-3440
Channel:College of Liberal Arts and Sciences
Categories:Lectures
Actions:Download iCal/vCal | Email Reminder
"Big Data: The End of Sampling As We Know It?", Lily Wang, Department of Statistics, University of Georgia, Athens

Each day in our lives we are breathing the air of digital data. Nowadays, "Big Data" is seemingly generated at all times by everything around us. It arrives with the well-known four V's: alarming Velocity, ever-expanding Volume, multi-source Variety, and inhomogeneous Veracity. Traditionally, scientists have used sampling to draw inferences from data obtained from large populations. However, "Big Data" introduces the possibility that one can obtain exact results for an entire population. Will this eliminate the need for sampling? Is sampling an artifact of past best practices? In this talk we will address the above questions, and compare results obtained from analyses applied to populations and samples thereof.

"Big Data" come to us with great promise, as they can enhance and improve sample estimates by providing a huge number of auxiliary variables correlated with our primary variables of interest. We have entered an era where data collection is cheap, but extracting useful information from such data is not. In this talk, we introduce a general strategy for variable selection from large data-sets under various sampling designs. A survey-weighted penalized estimating equation approach is proposed to simultaneously select significant variables and estimate model coefficients. The proposed estimators are design-consistent and perform as well as the oracle procedure when the correct sub-model is known. A fast and efficient variable selection algorithm is developed to identify significant variables for complex longitudinal surveys. Examples will be illustrated to show the usefulness of the proposed methodology under various model settings and sampling designs.