Statistical Modelling for Business – ( i have finish code part )

The deadline is Monday, September 6
th
by 11:59pm Sydney time.
Submission is via TurnItIn on Canvas.
Requirements:
Complete your entire assignment in Jupyter Notebook, including your code and
markdown sections for your written answers. Use Latex in markdown sections
where needed.
Submit the resulting downloaded html fifile as your entire assignment. Care must
be taken with presentation in this fifile, however unavoidable error messages and
page formatting issues will be ignored in marking
.
Only relevant analysis outputs (graphs, tables, etc) should appear in the as

signment fifile and all output should appear together with the discussion of that
output, in the fifile.
Task 1 (30 marks)
. Business problem:
This assignment follows the analysis conducted in the lectures regarding the dependence
between earnings and asset returns for companies listed on the NYSE. You will assess
whether earnings in one year (say tt1) affffect asset returns in the subsequent year (say
t), and in particular whether returns are typically higher following positive, compared
to negative, earnings years and also assess whether there may be a linear relationship
between returns and lagged earnings.
Data: The data fifile for the analysis is

SampleData from US 90 08 wk3.csv

which
was sampled from

US 90 08 wk3.csv

.2
Questions:
(a) Conduct an appropriate exploratory analysis on the asset returns, both individually
and in terms of one of the primary questions being considered in this assignment:
are returns in the subsequent year t typically higher following positive, compared to
negative, earnings years in year t t 1? Discuss any cleaning of the data you did,
including why and how you did it, or why you did not do it.
(3 marks)
(b) Conduct the appropriate t

test (with α
=
0.05), median and Mann

Whitney tests,
to assess whether returns are typically higher following positive, compared to negative,
earnings years. For median tests, use two

sided testing
. Assess all assumptions made.
(10 marks)
(c) Which test

s result do you believe the most in part (b)? Discuss and explain.
(2
marks)
(d) Conduct an appropriate exploratory analysis to assess whether there may be a
linear relationship between returns and lagged earnings.
(3 marks)
(e) Conduct a simple linear regression analysis, using OLS estimation, for returns
on lagged earnings. Fully assess all assumptions of OLS. Also list and assess the
assumptions of LAD (no need to obtain the LAD estimates)
. Discuss any cleaning of
the data you did, including why and how you did it, or why you didn

t do it.
(9 marks)
(f) Write a brief (< 0.5 page) report summarising and discussing your fifindings and
conclusions in layman

s terms. Include a discussion of whether you would recommend
an investment strategy based on your fifindings.
(3 marks)
Task 2 (20 marks)
. Theoretical derivations:
Consider the population SLR model:
Yi
=
β0 β1Xi εi
and an observed, random sample of data (y1, x1),
. . .
,(yn, xn) from that model. An
OLS regression is run on this data.3
Questions:
(a) Show that the mean of the estimated residuals from the OLS regression exactly
equals 0, i.e.
¯
e
=
0. Hint: look at the fifirst equation found when difffferentiating the
residual sum of squares with respect to β0
.
(2 marks)
(b) Show that the correlation between the estimated residuals and the observed x

s
exactly equals 0. How does this result relate to the 2nd LSA? Hint: look at the second
equation found when difffferentiating the residual sum of squares with respect to β1
.
(5
marks)
(c) Show that the equality T SS
=
RegSS RSS holds, i.e. show that:
X
n
i=1
(yi i y
¯
)
2 =
X
n
i=1
(
ˆ
yi i y
¯
)
2
X
n
i=1
(yi i y
ˆ
i)
2
Hint: add and subtract
ˆ
yi inside the square on the left side of the equation.
(6 marks)
(d) Explain and show why SER
2 =
Var(
d
)
=
Var(
d
Y |X)
.
(7 marks)