Loading...
Please wait, while we are loading the content...
Similar Documents
Multiple imputation of missing categorical data using latent class models: State of art
| Content Provider | Semantic Scholar |
|---|---|
| Author | Vidotto, Davide Kaptein, Maurits Vermunt, Jeroen K. |
| Copyright Year | 2015 |
| Abstract | (ProQuest: ... denotes formulae omitted.)IntroductionSocial and behavioral science researchers often collect data using tests or questionnaires consisting of items which are supposed to measure one or more underlying constructs. In a psychology assessment study for example, this could be constructs such as anxiety, extraversion, or neuroticism. A very common problem is that a part of the respondents fail to answer all questionnaire items (Huisman, 1998), resulting in incomplete datasets. However, most of the standard statistical techniques can not deal with the presence of missing data. For example, computation of Cronbach's alpha requires that all variables in the scale of interest are observed.Various methods for dealing with item nonresponse have been proposed (Little & Rubin, 2002; Schafer & Graham, 2002). Listwise and pairwise deletion, which simply exclude units with unobserved answers from the analysis, are the most frequently used in psychological research (Schlomer, Bauman, & Card, 2010). These are, however, also the worst methods available (Wilkinson & Task Force on Statistical Inference, 1999): they result in loss of power and, unless the strong assumption that data are missing completely at random (MCAR)1 is met, they may lead to severely biased results. Due to their simplicity and their widespread inclusion as standard options in statistical software packages, these methods are still the most common missing data handling techniques (Van Ginkel, 2007).Methodological research on missing data handling has lead to two alternative approaches that overcome the problems associated with listwise or pairwise deletion: maximum likelihood for incomplete data (MLID) and multiple imputation (MI). Under the assumption that the missing data are missing at random (MAR), the estimates of the statistical model of interest (from here on also referred to as the substantive model) resulting from MLID or MI have the desirable properties to be unbiased, consistent, and asymptotically normal (Roth, 1994; Schafer & Graham, 2002; Allison, 2009; Baraldi & Enders, 2010). MLID involves estimation the parameters of the substantive model interest by maximizing the incomplete-data likelihood function. That is, the likelihood function consisting of a part for the units with missing data and a part for the units with fully observed data. While in MLID the missing data and the substantive model are the same, in MI (Rubin, 1987) the missing data handling model (or imputation model) and the substantive model(s) of interest can and will typically be different. Note that unlike single value imputation, MI replaces each missing value with m > 1 imputed values in order to be able to account for the uncertainty about the missing information. In practice, applying MI yields m complete datasets, each of which can be analyzed separately using the standard statistical method of interest, and where the m results should be combined in a specific manner. For more details on MI, we refer to Rubin (1987), Schafer (1997), and Little and Rubin (2002).For continuous variables with missing values, Schafer (1997) proposed using the multi-variate normal MI model, which has been shown to be quite robust to departures from normality (Graham & Schafer, 1999). Items of psychological assessment questionnaires, however, are categorical rather than continuous variables. For such categorical data, Schafer (1997) proposed MI with log-linear models, which can capture the relevant associations in the joint distribution of a set of categorical variables and can be used to generate imputation values. However, log-linear models for MI can only be applied when the number of variables is relatively small, as the number of cells in the multi-way cross-table that has to be processed increases exponentially with the number of variables (Vermunt, Van Ginkel, Van der Ark, & Sijtsma, 2008).An alternative MI tool is offered by the sequential regression modeling approach, which includes multiple imputation by chained equation (MICE) (Van Buuren & Oudshoorn, 1999). … |
| Starting Page | 542 |
| Ending Page | 576 |
| Page Count | 35 |
| File Format | PDF HTM / HTML |
| Volume Number | 57 |
| Alternate Webpage(s) | http://www.psychologie-aktuell.com/fileadmin/download/ptam/4-2015_20151218/06_Vidotto.pdf |
| Alternate Webpage(s) | http://members.home.nl/jeroenvermunt/vidotto2014.pdf |
| Language | English |
| Access Restriction | Open |
| Content Type | Text |
| Resource Type | Article |