Information System Notes: Prediction with logistic regression

Yesterday evening we had the 3rd meeting of our. The tradition here at the University of Arkansas is to start with linear regression and logistic regression. Then go on with decision tree and other algorithms. I skipped linear regression, explained logistic regression theory and moved on to dissecting and interpreting SAS EM results. The data set is telecom Churn data set which is available on the book website, has over 3000 records and is very clean.

The ppt slides can be found here. I started with a simple example of linear regression: Gorgiean's enjoyment of snow over time which I found from here. Interestingly we had snow on Monday, there was some snow on the ground when we woke up. Thus this example was very relevant !!! My students liked it.

I then talked about data partitioning. I explained the reason for moving from:

target -> probability of target -> odds of target -> log of odds -> conducting linear regression of log of odds on the input variables.

It is not easy for an undergraduate student to digest these but I wanted to expose them to the ideas. I recommend this to all teachers to spend some time on explaining the assumptions and theory behind logistic regression.

On SAS EM output, I focused on coefficient estimates, significance levels for each input and for the whole regression equation, misclassification rate, false positives, false negatives, lift, and lift chart. The class activity and the follow up concept checks are available here.

I plan to talk about stepwise, forward, and backward variable selection methods in next class- and move on to KNN.

Information System Notes

Tuesday, February 14, 2012

Prediction with logistic regression

No comments:

Post a Comment