Status-achievement Model Using EM Imputation of Missing Values - GSS 1989 Data

>USE "D:\mydocs\ys209\gss89m16.syd"

SYSTAT Rectangular file D:\mydocs\ys209\gss89m16.syd,
created Fri Apr 19, 2002 at 10:03:50, contains variables:

 ID           PRESTIGE     PAPRES16     AGE          ZODIAC       EDUC
 PAEDUC       MAEDUC       SEX          RACE         WORDSUM

>idvar=id
>print=long
>if educ>90 then let educ=.
>if paeduc>90 then let paeduc=.
>if maeduc>90 then let maeduc=.
>if wordsum>90 then let wordsum=.
>corr
>save gss89imp.syd/data
>pearson papres16 paeduc maeduc wordsum educ prestige/em

NOTE: Case   is an outlier. Mahalanobis D^2=22.676698  z=3.090202
NOTE: Case   is an outlier. Mahalanobis D^2=28.732751  z=3.754679
NOTE: Case   is an outlier. Mahalanobis D^2=19.906854  z=3.236595
NOTE: Case   is an outlier. Mahalanobis D^2=21.640463  z=3.197591
NOTE: Case   is an outlier. Mahalanobis D^2=24.257678  z=3.274090
NOTE: Case   is an outlier. Mahalanobis D^2=15.982172  z=3.015025
NOTE: Case   is an outlier. Mahalanobis D^2=24.746567  z=3.551029
NOTE: Case   is an outlier. Mahalanobis D^2=21.738339  z=3.452252
NOTE: Case   is an outlier. Mahalanobis D^2=18.530947  z=3.065711
NOTE: Case   is an outlier. Mahalanobis D^2=22.579376  z=3.078607
NOTE: Case   is an outlier. Mahalanobis D^2=23.856495  z=3.228203
NOTE: Case   is an outlier. Mahalanobis D^2=20.329374  z=3.038216
NOTE: Case   is an outlier. Mahalanobis D^2=27.096911  z=3.799247
NOTE: Case   is an outlier. Mahalanobis D^2=31.988545  z=4.073729

EM Algorithm   Iteration   Maximum Error   -2*log(likelihood)
               ---------   -------------   ------------------
                    1            1.077         339951.313
                    2            7.495         247586.041
                    3            8.858         195353.708
                    4            0.192         162506.910
                    5            0.035         145462.139
                    6            0.025         137537.631
                    7            0.014         133855.878
                    8            0.006         132230.165
                    9            0.003         131574.375
                   10            0.002         131328.433
                   11            0.001         131239.105

 No.of  Missing value patterns

 Cases  (X=nonmissing; .=missing)
   41   XXXXX.
   35   X...XX
  365   XXX.XX
  100   ..XXXX
  628   XXXXXX
   13   XXX.X.
    2   .XXXX.
   38   X.XXXX
    5   X.XXX.
   47   XX.XXX
   61   ..X.XX
   27   XX..XX
   31   ...XXX
    4   ...XX.
   55   X..XXX
    3   .XXXXX
    2   X...X.
    8   ..XXX.
    1   ..X..X
   25   X.X.XX
   17   ....XX
    2   .....X
    8   ..X.X.
    3   XX..X.
    2   .XX.XX
    3   XXXX.X
    2   ....X.
    2   X.X.X.
    1   ......
    3   X..XX.
    3   XX.XX.

Little MCAR test statistic:      256.741  df =    99  prob = 0.000

NOTE: the p-value is 0.000 so we reject the null hypothesis that the values for the 6 variables are MCAR (Missing Completely At Random).  PS: It is "Little's MCAR test" (i.e., the MCAR test invented by Roderick Little) not "a little MCAR test" (as opposed to a "big" MCAR test), in case you wondered.

EM estimate of means

                  PAPRES16       PAEDUC       MAEDUC      WORDSUM         EDUC      PRESTIGE
                    40.992       10.093       10.397        5.889       12.727        41.046

EM estimated correlation matrix

                  PAPRES16       PAEDUC       MAEDUC      WORDSUM         EDUC      PRESTIGE
 PAPRES16            1.000
 PAEDUC              0.453        1.000
 MAEDUC              0.334        0.701        1.000
 WORDSUM             0.155        0.281        0.307        1.000
 EDUC                0.242        0.462        0.469        0.505        1.000
 PRESTIGE            0.171        0.242        0.209        0.338        0.521         1.000

Pairwise frequency table

                  PAPRES16       PAEDUC       MAEDUC      WORDSUM         EDUC      PRESTIGE
 PAPRES16             1295
 PAEDUC               1130         1137
 MAEDUC               1120         1057         1305
 WORDSUM               823          727          828          971
 EDUC                 1292         1134         1301          968         1530
 PRESTIGE             1223         1075         1226          905         1434          1440

Matrix has been saved.

>USE "D:\mydocs\ys209\gss89imp.syd"

SYSTAT Rectangular file D:\mydocs\ys209\gss89imp.syd,
created Mon Apr 22, 2002 at 09:43:04, contains variables:

 PAPRES16     PAEDUC       MAEDUC       WORDSUM      EDUC         PRESTIGE

>stats
>stat

                      PAPRES16      PAEDUC      MAEDUC     WORDSUM        EDUC    PRESTIGE
  N of cases             1536        1536        1536        1536        1536        1536
  Minimum              12.000       0.000       0.000       0.000       0.000      12.000
  Maximum              82.000      20.000      20.000      10.000      20.000      82.000
  Mean                 40.992      10.093      10.397       5.889      12.727      41.046
  Standard Dev         12.208       4.083       3.509       1.903       3.031      14.178

>regress
>print
>rem say print to cancel out previous print=long
>model wordsum=constant+paeduc+maeduc+papres16
>estimate

1 case(s) deleted due to missing data.

Dep Var: WORDSUM   N: 1536   Multiple R: 0.391   Squared multiple R: 0.153
Adjusted squared multiple R: 0.151   Standard error of estimate: 1.753

Effect         Coefficient    Std Error     Std Coef Tolerance     t   P(2 Tail)
CONSTANT             3.586        0.179        0.000      .      20.076    0.000
PAEDUC               0.071        0.018        0.152     0.386    4.022    0.000
MAEDUC               0.136        0.019        0.250     0.437    7.028    0.000
PAPRES16             0.004        0.004        0.028     0.764    1.032    0.302

                             Analysis of Variance

Source             Sum-of-Squares   df  Mean-Square     F-ratio       P

Regression               849.701     3      283.234      92.141       0.000
Residual                4709.232  1532        3.074

----------------------------------------------------------------------------------------------------------------------------------

*** WARNING ***
Case          113 has large leverage   (Leverage =        0.022)
Case         1077 has large leverage   (Leverage =        0.019)
Case         1200 has large leverage   (Leverage =        0.018)

Durbin-Watson D Statistic     1.827
First Order Autocorrelation   0.085

>print
>model educ=constant+wordsum+paeduc+maeduc+papres16
>estimate

1 case(s) deleted due to missing data.

Dep Var: EDUC   N: 1536   Multiple R: 0.676   Squared multiple R: 0.457
Adjusted squared multiple R: 0.456   Standard error of estimate: 2.236

Effect         Coefficient    Std Error     Std Coef Tolerance     t   P(2 Tail)
CONSTANT             5.327        0.256        0.000      .      20.804    0.000
WORDSUM              0.723        0.033        0.454     0.847   22.196    0.000
PAEDUC               0.160        0.023        0.215     0.382    7.059    0.000
MAEDUC               0.138        0.025        0.159     0.423    5.502    0.000
PAPRES16             0.002        0.005        0.010     0.763    0.449    0.654

                             Analysis of Variance

Source             Sum-of-Squares   df  Mean-Square     F-ratio       P
Regression              6450.367     4     1612.592     322.469       0.000
Residual                7656.167  1531        5.001

----------------------------------------------------------------------------------------------------------------------------------
*** WARNING ***

Case          113 has large leverage   (Leverage =        0.023)
Case         1381 is an outlier        (Studentized Residual =       -4.671)

Durbin-Watson D Statistic     1.809
First Order Autocorrelation   0.095

>model prestige=constant+educ+wordsum+paeduc+maeduc+papres16
>estimate

1 case(s) deleted due to missing data.

Dep Var: PRESTIGE   N: 1536   Multiple R: 0.551   Squared multiple R: 0.304
Adjusted squared multiple R: 0.302   Standard error of estimate: 11.847

Effect         Coefficient    Std Error     Std Coef Tolerance     t   P(2 Tail)
CONSTANT             7.291        1.536        0.000      .       4.746    0.000
EDUC                 2.196        0.135        0.470     0.543   16.223    0.000
WORDSUM              1.060        0.198        0.142     0.641    5.343    0.000
PAEDUC               0.118        0.122        0.034     0.370    0.973    0.331
MAEDUC              -0.438        0.134       -0.108     0.415   -3.272    0.001
PAPRES16             0.071        0.028        0.061     0.763    2.505    0.012

                             Analysis of Variance

Source             Sum-of-Squares   df  Mean-Square     F-ratio       P
Regression             93831.517     5    18766.303     133.714       0.000
Residual              214729.575  1530      140.346

----------------------------------------------------------------------------------------------------------------------------------
*** WARNING ***

Case          113 has large leverage   (Leverage =        0.023)
Case         1381 has large leverage   (Leverage =        0.025)

Durbin-Watson D Statistic     2.045
First Order Autocorrelation  -0.025

>rem re-estimate last model without the 2 high leverage observations
>just out of curiosity
>select case<>113 and case<>1381
>model prestige=constant+educ+wordsum+paeduc+maeduc+papres16
>estimate

Data for the following results were selected according to:

      case<>113 and case<>1381

1 case(s) deleted due to missing data.

Dep Var: PRESTIGE   N: 1534   Multiple R: 0.552   Squared multiple R: 0.305
Adjusted squared multiple R: 0.303   Standard error of estimate: 11.834

Effect         Coefficient    Std Error     Std Coef Tolerance     t   P(2 Tail)
CONSTANT             7.213        1.541        0.000      .       4.680    0.000
EDUC                 2.211        0.136        0.471     0.541   16.235    0.000
WORDSUM              1.042        0.199        0.140     0.638    5.243    0.000
PAEDUC               0.145        0.123        0.042     0.363    1.179    0.239
MAEDUC              -0.466        0.135       -0.115     0.408   -3.445    0.001
PAPRES16             0.071        0.028        0.061     0.764    2.505    0.012

                             Analysis of Variance

Source             Sum-of-Squares   df  Mean-Square     F-ratio       P
Regression             93938.992     5    18787.798     134.150       0.000
Residual              213997.539  1528      140.051

----------------------------------------------------------------------------------------------------------------------------------
Durbin-Watson D Statistic     2.044
First Order Autocorrelation  -0.024



Last modified 22 Apr 2002