University of North Carolina
at Chapel Hill

SOCI709 (formerly 209) -- LINEAR REGRESSION MODELS

Professor François Nielsen

Section 001 - Spring 2006


1.  TIME, PLACE & CONTACTS

2.  COURSE DESCRIPTION

1.  Goals

The two main purposes of the course are: The course presents regression analysis (simple and multiple) and related techniques. The major topics are: the assumptions of the regression model, matrix representation of the regression model, statistical inference including general linear tests, polynomial regression and interaction models, qualitative (dummy) independent variables, diagnostics and remedies for outliers and influential cases, collinearity, problems of model building and specification, heteroskedasticity, autocorrelation of errors in time series data, and problems of missing values and selection bias.

2.  Prerequisites

Students taking the class should have had some experience with the use of statistical software, some familiarity with matrix algebra, and have a statistical background equivalent to that provided by Sociology 208 (e. g., at least up to statistical inference in simple regression models).

3.  READINGS & CLASS NOTES

1.  Readings

For the main textbook you can use either:
 

Kutner, Michael H., Chris J. Nachtsheim, John Neter, and William Li.  2004. Applied Linear Statistical Models with Student CD-rom.  5e.  McGraw-Hill.  ISBN 007310874X

or (subset of the first)
Kutner, Michael H., John Neter, and Christopher J. Nachtsheim.  2004. Applied Linear Regression Models with Student CD-rom.  4e.  McGraw-Hill.  ISBN 0-07-295567-8.
Applied Linear Regression Models is a subset (14 first chapters) of Applied Linear Statistical ModelsApplied Linear Regression Models includes all the materials on regression that we need for the course.   Applied Linear Statistical Models is about twice as long, covering additional chapters on analysis of variance (ANOVA) and the design of experiments.  The chapters common to the two books are identical.  I have ordered Applied Linear Statistical Models at the bookstore, as it may represent a somewhat better deal (more pages for the money) and as some of you will be using ANOVA in your research.  However if you will not be using ANOVA, or prefer to travel light, or you find a used copy (as this is the version I used last year, as the other one was not yet out) you may want to use Applied Linear Regression Models. Note that for maximum confusability the two versions have different edition numbers (5e and 4e) even though the common materials are identical!
Instead of one of the new edition (ALSM 5e or ALRM 4e) you may already have a copy of an earlier edition, or find a used one to buy.  The earlier edition of either text will be mostly fine for use in the course.  There are again two versions of the earlier edition:
Neter, John, Michael H. Kutner, Christopher J. Nachtsheim and William Wasserman. 1996. Applied Linear Statistical Models. 4th edition. Burr Ridge, IL: Irwin.  ISBN 0-256-11736-5
or (subset of the first):
Neter, John, Michael H. Kutner, Christopher J. Nachtsheim and William Wasserman. 1996. Applied Linear Regression Models. 3d edition. Burr Ridge, IL: Irwin.  ISBN 0-256-08601-X
Again Regression Models consists of the first 15 chapters of Statistical Models. Statistical Models goes on with 17 more chapters covering the topics of analysis of variance and experimental designs, which we do not cover in this course. Regression Models (xv+720 pages) is much shorter (and lighter!) than Statistical Models (xv+1408 pages).

The following books with self-explanatory titles are also useful and have been ordered at student stores:

Allison, Paul.  1999.  Multiple Regression: A Primer.  Thousand Oaks, CA: Pine Forge Press.ISBN 0761985336 Paperback.
Hamilton, Lawrence C.  2006.  Statistics With STATA.  (Updated for Version 9.0.)  Brooks/Cole.  ISBN 0-495-10972-X.  Paperback.  (The previous Version 8 edition is very similar.) 
For computer work we will use mainly STATA, but I am myself an old SYSTAT user so I will use this program to show some examples in class and you may find it useful yourself for some things that SYSTAT may do better than STATA.  Both programs are comprehensive statistical packages that can be used in Odum Laboratory (Hamilton 228).  STATA is also available at the Odum Institute (IRSS) computer lab in Manning Hall.  The TA will help people find their way to the software in Odum lab.  Students who are already familiar with another statistical program (such as SAS or SPSS) may use it for the assignments.  For the remote hands-on sessions we will use STATA.

2.  Class Notes

Comprehensive class notes and the pictures and documents that I show in class are available on the World Wide Web at the address http://www.unc.edu/~nielsen/.  Click on the entry for SOCI709 (209).  Then click on Class Notes in the side-bar.  You can print these notes or download them as files.  The class notes will be updated as necessary during the semester.  Insofar as possible I will try to have the updated version up and available prior to the corresponding class, but this may not always be possible so I must reserve the right to revise the notes without notice at any time during the semester.
 
 
 


Class Notes
This outline reflects a future extended version of the course that is not yet completely implemented; not yet existing modules are shaded in light blue; the numbers of existing modules are consistent with those on the class schedule in Section 6.
Part 1
SIMPLE LINEAR REGRESSION
Module 1
Simple Linear Regression
Module 2
Statistical Inference
Module 3
Diagnostics & Remedies
Part 2
MULTIPLE LINEAR REGRESSION & GENERAL LINEAR MODEL
Module 4
Matrix Representation of the Regression Model
Module 5
Multiple Regression & the General Linear Model
Module 6
Polynomial Regression & Interactions
Module 7
Qualitative Independent Variables
Module 8
General Linear Tests
Module 9n   Model Building & Specification
Part 3
COMPLICATIONS OF MULTIPLE REGRESSION: 
DIAGNOSTICS & REMEDIES
Module 10
Outlying & Influential Observations
Module 9   Partial Regression Plots
Module 11
Collinearity & Ridge Regression
Module 12
Heteroskedasticity
Module 13
The Bootstrap
Part 4
SPECIAL DATA STRUCTURES
Module 14
Autocorrelation in Time Series Data
Module 14b X Pooled Time Series of Cross Sections
Module 14c X Multi Level Models
Module 16 X Missing Values & Selection Bias


 


STATISTICAL TABLES


REFERENCES

 

4.  REQUIREMENTS

Grades will be based on the following requirements: More details follow.

Remote Hands-On Sessions

Thursday class consists of a remote hands-on session in a location of your choice.  You can use your own computer at home or elsewhere (as long as it has a fast connection).  Alternatively you can use the computer labs in Saunders 322 (21 machines; lab is reserved for the class Thu 9:30-10:45) and Hamilton 228 (Odum Lab; about 15 machines but the lab is not reserved for the class); use of a headset is strongly recommended especially if you are using a computer lab.  The computer you use has to be checked once before the sessions by going to a website; you should have access to the program STATA.  To know more about the remote sessions click on Remote Session in side-bar.

Homework

Students are allowed and encouraged to collaborate with others in doing the assignments.  A group of collaborating students may turn in a single copy of the assignment with the names of all participants.  Each participant will receive the same grade for that assignment.  Alternatively you can turn in your own copy of the assignment even if you have worked on some or all of the problems in collaboration with others.  Collaborating with others on one assignment does not obligate you to collaborate on other assignments.  Click on Assignments in side bar for the current assignment, as well as old versions.

Exams

The midterm and final are in-class exams based on multiple-choice questions.  These exams are closed-books but students can bring a "cheat-sheet" no longer than 1 sheet front & back (= 2 pages) for the midterm and 2 sheets front & back (= 4 pages) for the final.  The final is not cumulative except that some questions on the final may involve knowledge that was part of materials covered for the midterm.  For more information on materials covered by the exams see

Short Paper

Students are allowed and encouraged to collaborate with others in doing the paper, on the same basis as for the homework.  You can use your own data set or use a data set provides by the instructors.  Click on Assignments in side bar for more details on the paper.

5.  CLASS SCHEDULE & READINGS

ALSM5e = Applied Linear Statistical Models 5e (2004) OR Applied Linear Regression Models 4e (2004).
ALSM4e = Applied Linear Statistical Models 4e (1996) OR Applied Linear Regression Models 3e (1996).
Readings are indicated by section numbers.  For example, ALSM5e 2.1 means Chapter 2, Section 1.  Additional references are given below.  See also References in side bar.
Rm column marks important dates: (1) = assignment 1 due; (2) = assignment 2 due; (3) = assignment 3 due; (4) = assignment 4 due; (P) = paper due; (M) = midterm; (F) = final.
 
Class Rm
Date Subject Web Readings
1
Thu 12-Jan First contact




--o--
2
Tue 17-Jan Simple linear regression M1 ALSM5e 1.1-1.8; ALSM4e 1.1-1.8. 
3
Thu 19-Jan Remote session #1





--o--
4
Tue 24-Jan Statistical inference M2 ALSM5e 2.1-2.10 (omit 2.11 on normal correlation model); ALSM4e 2.1-2.11. 
5
Thu 26-Jan Remote session #2





--o--
6
Tue 31-Jan Diagnostics and remedies M3 ALSM5e 3.1-3.6; 3.8-3.11; ALSM4e 3.1-3.6; 3.8-3.11; Wilkinson et al. (1996) 
7
Thu 2-Feb Remote session #3
 




--o--
8
Tue 7-Feb Matrix representation M4 ALSM5e 5.1-5.13; ALSM4e 5.1-5.13. 
9
Thu 9-Feb Remote session #4





--o--
10 (1) Tue 14-Feb Multiple regression & general linear model M5 ALSM5e 6.1-6.9; 7.5; ALSM4e 6.1-6.9, 7.5.
11
Thu 16-Feb Remote session #5    




--o--
12
Tue 21-Feb Polynomial regression & interactions M6 ALSM5e 8.1-8.2; ALSM4e 7.7-7.9.
13
Thu 23-Feb Remote session #6    




--o--
14 (2) Tue 28-Feb Qualitative independent variables M7 ALSM5e 8.3-8.7; ALSM4e 11.1-11.7. 
15   Thu 2-Mar Remote session #7    




--o--
16
Tue 7-Mar Review/Catch-up

17 (M) Thu 9-Mar <<MIDTERM 9:30-10:45 AM>>





--o--


Tue 14-Mar <Spring Break - NO CLASS>


Thu 16-Mar <Spring Break - NO CLASS>




--o--
18
Tue 21-Mar General linear tests M8 ALSM5e 2.8, 7.1-7.4; ALSM4e 2.8, 7.1-7.4. 
19
Thu 23-Mar Remote session #8    




--o--
20
Tue 28-Mar Model building & specification M9n  ALSM5e 9.1-9.6; ALSM4e 8.1-8.5
21
Thu 30-Mar Remote session #9





--o--
22 (3) Tue 4-Apr Outlying & influential observations; partial regression plots M9
M10
ALSM5e 10.1-10.4, 11.3-11.4 (pp. 449-453 on LOWESS; part on regression trees optional); ALSM4e 9.1-9.4, 10.3-10.4; Fox (1991); Bollen & Jackman (1985)
23   Thu 6-Apr Remote session #10
 




--o--
24
Tue 11-Apr Collinearity & ridge regression M11 ALSM5e 7.6, 10.5, 11.2; ALSM4e 7.6, 9.5, 10.2. 
25
Thu 13-Apr Remote session #11





--o--
26 (4) Tue 18-Apr Heteroskedasticity/The bootstrap M12
M13
ALSM5e 11.1, 11.5; ALSM4e 10.1, 10.5; Diaconis & Efron (1983) 
27
Thu 20-Apr Remote session #12    




--o--
28 (P) Tue 25-Apr Autocorrelation in time series data M14 ALSM5e 12.1-12.4; ALSM4e 12.1-12.4. 
29
Thu 26-Apr Review/Catch-up





--o--

(F) Tue 2-May <FINAL 8:00-11:00 AM>

6.  OUTLINE REFERENCES  (See also REFERENCES in side bar)



Last modified 10 Jan 2006