Linear+Regression+Problems

AP Statistics Exam 2002 Question 4 ====1) Commercial airlines need to know the operating cost per hour of flight for each plane in their fleet. In a study of the relationship between operating cost per hour and number of passenger seats, investigators computed the regression of operating cost per hour on the number of passenger seats. The 12 sample aircraft used in the study included planes with as few as 216 passenger seats and planes with as many as 410 passenger seats. Operating cost per hour ranged between $3,600 and $7,800. Some computer output from a regression analysis of these data is shown below.====



Predicted Cost (y hat)= 1136 + 14.673 (number of passenger seats)
 * a) What is the equation of the LSRL that describes the relationship between operating cost per hour and number of passenger seats in the plane? Define any variables used.**

The r value equals 0.755 and is positive because the graph shows a positive correlation. There is a moderately positive linear correlation between operating cost per hour and the number of passenger seats.
 * b) What is the value of the correlation coefficient between operating cost per hour and number of passenger seats in the plane? Interpret this correlation.**

For every number of passenger seats there is an average increase of $14.673 of predicted total operating cost per hour.
 * c) Interpret the slope.**

AP Statistics Exam 2005 Form B ====2) John believes that if he increases his walking speed then his pulse will increase as well. He wants to model this relationship. John records his pulse rate in beats per minute, while walking at each of seven different speeds in miles per hour. A scatterplot and regression output are shown below.====



Predicted pulse= 63.457 + 16.2809(speed)
 * a) Using the regression output, write the equation of the LSRL.**

Step 1: Step 2: Step 3: Step 4:
 * b) Construct a 95% Confidence Interval for the data.**
 * We are estimating the average change in pulse rate for every change in speed.
 * Assume Linear
 * Assume Normality
 * Linear Regression t interval
 * DF = 5
 * b +/- t*SEb
 * 16.2809 +/- (2.571)0.8192 = (14.18 to 18.3809)
 * I am 95% confident that the average pulse rate will increase at an average of (14.18 to 18.3809) for each increase in speed in repeated samples.

AP Statistics Exam 2001 Question 6 ====3) The statistics department at a large university is trying to determine if it is possible to predict wether an applicant will successfully complete a PhD program or will leave before completion. The department is considering wether GPA in undergraduate statistics and mathematics courses and mean number of credit hours per semester would be helpful measures. To gather data an SRS of 20 entering students from the past five years is taken. The regression output from fitting the line to the data follows.====

Step 1: Step 2: Step 3: Step 4:
 * a) For the students who completed the program is there a significant relationship between GPA and mean number of credit hours per semester? Give statistical justification. (In "little people terms" this means do a significance test)**
 * Ho: β = 0, there is NO linear association between GPA and mean credit hours
 * Ha: β ≠ 0, there is a linear association between GPA and mean credit hours
 * Linear Regression t Test
 * Assume Linear
 * Assume Normality
 * t= b / SE b
 * t = -2.7555 / 0.4668
 * t= -5.903 with a corresponding p value of approximately zero
 * 0< 0.05 (alpha)
 * Reject the Ho, there is enough evidence to say that for students who completed the program successfully there is a linear association between GPA and mean credit hours per semester.

4) Gathering Information from a Categorical Data Table

 * || student smokes || student doesn't smoke ||  ||
 * Both parent smokes || 400 || 1,380 || 1,780 ||
 * one parent smokes || 416 || 1,823 || 2,239 ||
 * no parent smokes || 188 || 1,168 || 1,356 ||
 * //totals// || //1,004// || //4,371// || //5,375// ||

b) What percent of students smoke? c) Give the marginal distribution of parents smoking behavior in counts and percents.
 * a) How many students are in the data?**
 * 1,780+ 2,239+ 1,356 = 5, 375
 * 1,004 / 5,375 = 18.7%
 * Both: 1,780 / 5375 = 33.12%
 * One: 2,239 / 5,375 = 41.66%
 * None: 1,356 / 5,375 = 25.23%

5) Transform by logging
a) State the ratio b) Transform by log
 * year (L1) || acres (L2) ||
 * 1978 || 63,042 ||
 * 1979 || 226,260 ||
 * 1980 || 907,075 ||
 * 1981 || 2,826,095 ||
 * 226260 / 63042 = 3.588
 * 907,075 / 226,260 = 4.009
 * logy-hat = -1094.51+ .556x
 * y-hat = 10^(-1094.51 + .556x)
 * In L3 enter (LogL2)
 * Stat Calc #8
 * Use L1 and L3 for residual plot