Describe the relationship between the two variables

Describe the relationship between the two variables

Describe the relationship between the two variablesDescribe the relationship between the two variables

This assignment contains three (3) questions worth a total of 20 marks. There is some general advice on the assignment at the end of this document, on page 8.

The overall requirement for this assignment is to carry out and report on data analytics that address three questions about the data from the Framingham heart study.

You may know about this study from your general knowledge; it is one of the most famous studies in epidemiology. You can learn about the study from information on Wikipedia (https://en.wikipedia.org/wiki/Framingham_Heart_Study), but also through these references:

Describe the relationship between the two variables

 

Levy, D., National Heart Lung and Blood Institute., et al. (1999). 50 years of discovery: medical milestones from the National Heart, Lung, and Blood Institute’s Framingham Heart Study. Hackensack, N.J., Center for Bio-Medical Communication Inc..

Mahmood, S. S., Levy, D., Vasan, R. S., & Wang, T. J. (2014). The Framingham Heart Study and the epidemiology of cardiovascular disease: a historical perspective. The Lancet383(9921), 999-1008.

Oppenheimer, G. M. (2005). Becoming the Framingham study 1947–1950. American Journal of Public Health, 95(4), 602-610.

You may also find your own useful references. You are not required to read these references for the purposes of the assignment.

The data file contains some information from long term follow up as well as baseline measures. The file contains records for 5,209 people – all the participants in the original cohort of the study. The participants were followed up every 2 years. The data file includes information from baseline, the 2nd examination (one variable), and the 16th examination (30 years after baseline).

 

SRW MAST90007 2021 Major assignment

 

The data file includes: Age at baseline (years)

Weight at baseline (pounds)

Sex

Diastolic blood pressure at baseline (mmHg)

Systolic blood pressure at baseline (mmHg)

Serum cholesterol (mg/100ml) examination 2

Metropolitan Relative Weight at baseline

Smoker at baseline

Number cigarettes smoked per day at baseline

Survived at last examination

Describe the relationship between the two variables, and give a suitable summary statistic.

Female / Male

Serum cholesterol (mg/100ml) at the 2nd examination; this variable has 626 missing values.

A measure of the percentage of actual weight to desirable weight; a measure very similar to BMI.

Smoker / Non-smoker

0 = alive at 16th examination; 1 = died prior to 16th examination

 

Serum cholesterol (mg/100ml) examination 1 Serum cholesterol (mg/100ml) at baseline; this variable has 2,037 missing values.

 

2

 

Height at baseline (inches)

 

Body Mass Index at baseline (kg/m2)

 

Serum cholesterol (mg/100ml) baseline Baseline serum cholesterol at examination 1, or, when missing at examination 1, the

serum cholesterol at the second examination.

Describe the relationship between the two variables, and give a suitable summary statistic.

Last examination number Number of the last examination that the person participated in.

Describe the relationship between the two variables, and give a suitable summary statistic.

Cause of death

 

0 = still alive
1 = sudden death from coronary heart disease (CHD)
2 = other coronary heart disease
3 = stroke (cerebrovascular accident, CVA) 4 = other cerebral vascular disease
5 = cancer
6 = other causes of death
9 = cause unknown

 

Examination at which CHD diagnosed, if

 

SRW MAST90007 2021 Major assignment

MAST90007: Statistics for Research Workers 2021

1,500 word assignment

Due: 5 pm, Friday 30 July 2021

Submission Submit an electronic copy of the assignment via the LMS. IMPORTANT: All students in this subject are required to complete the online plagiarism declaration form for the subject as a whole, covering all work. You will find a link to the form on Canvas. If you do not include complete the online plagiarism form your assignment will not be accepted. This assignment contains three (3) questions worth a total of 20 marks. There is some general advice on the assignment at the end of this document, on page 8.

The overall requirement for this assignment is to carry out and report on data analytics that address three questions about the data from the Framingham heart study.

You may know about this study from your general knowledge; it is one of the most famous studies in epidemiology. You can learn about the study from information on Wikipedia (https://en.wikipedia.org/wiki/Framingham_Heart_Study), but also through these references:

Levy, D., National Heart Lung and Blood Institute., et al. (1999). 50 years of discovery: medical milestones from the National Heart, Lung, and Blood Institute’s Framingham Heart Study. Hackensack, N.J., Center for Bio-Medical Communication Inc.

Mahmood, S. S., Levy, D., Vasan, R. S., & Wang, T. J. (2014). The Framingham Heart Study and the epidemiology of cardiovascular disease: a historical perspective. The Lancet, 383(9921), 999-1008.

Oppenheimer, G. M. (2005). Becoming the Framingham study 1947–1950. American Journal of Public Health, 95(4), 602-610.

You may also find your own useful references. You are not required to read these references for the purposes of the assignment.

The data file contains some information from long term follow up as well as baseline measures. The file contains records for 5,209 people – all the participants in the original cohort of the study. The participants were followed up every 2 years. The data file includes information from baseline, the 2nd examination (one variable), and the 16th examination (30 years after baseline).

 

https://en.wikipedia.org/wiki/Framingham_Heart_Study

2

SRW MAST90007 2021 Major assignment

The data file includes:

Age at baseline (years)

Height at baseline (inches)

Weight at baseline (pounds)

Body Mass Index at baseline (kg/m2)

Sex Female / Male

Diastolic blood pressure at baseline (mmHg)

Systolic blood pressure at baseline (mmHg)

Serum cholesterol (mg/100ml) examination 1 Serum cholesterol (mg/100ml) at baseline; this variable has 2,037 missing values.

Serum cholesterol (mg/100ml) examination 2 Serum cholesterol (mg/100ml) at the 2nd examination; this variable has 626 missing values.

Serum cholesterol (mg/100ml) baseline Baseline serum cholesterol at examination 1, or, when missing at examination 1, the serum cholesterol at the second examination.

Metropolitan Relative Weight at baseline A measure of the percentage of actual weight to desirable weight; a measure very similar to BMI.

Smoker at baseline Smoker / Non-smoker

Number cigarettes smoked per day at baseline

 

Last examination number Number of the last examination that the person participated in.

Survived at last examination 0 = alive at 16th examination; 1 = died prior to 16th examination

Cause of death 0 = still alive 1 = sudden death from coronary heart disease (CHD) 2 = other coronary heart disease 3 = stroke (cerebrovascular accident, CVA) 4 = other cerebral vascular disease 5 = cancer 6 = other causes of death 9 = cause unknown

Examination at which CHD diagnosed, if applicable

 

 

3

SRW MAST90007 2021 Major assignment

The data were accessed from: http://courses.washington.edu/b513/datasets/datasets.php?class=513

The data file is Framingham.xlxs. You can drop and drag this file into Minitab.

When you do this, some of the variable names will be truncated; you will need to correct them to make them clear by shortening them.

There are some references to column numbers in the assignment. These numbers will be correct if you simply drag and drop the Excel file into Minitab; obviously, if you insert columns yourself in the Minitab file, your column numbers may differ from those given here.

Describe the relationship between the two variables, and give a suitable summary statistic.

http://courses.washington.edu/b513/datasets/datasets.php?class=513

4

SRW MAST90007 2021 Major assignment

Question 1 – Baseline data [6 marks]

This question focuses on baseline characteristics and data.

(a) Briefly describe the design of the study to provide context for the analyses you report.

(b) Produce a summary table to describe the following characteristics of the study participants: age at baseline, height at baseline, weight at baseline and sex.

(c) Consider systolic and diastolic blood pressure at baseline. Produce suitable visual

display(s) to allow a comparison of the distributions of these according to whether or not an individual was a smoker at baseline. You can exclude those with missing information about smoking from visual displays using Data Options > Group options.

(d) Carry out appropriate analyses to compare those who were smokers at baseline with

those who were not, for systolic and diastolic blood pressure. Provide one or more suitable tables that includes the summary statistics and inferential statistics.

(e) Discuss and justify any assumptions underlying your choice of analysis.

(f) Write a summary of the analyses you have carried out explaining the results of all the

comparisons you have made. Write the summary for a doctor interested in the practical application of the study results.

(g) Consider predicting systolic blood pressure at baseline from age and Metropolitan

relative weight at baseline. Provide graphical display(s) to illustrate the distributions of the explanatory variables. Explain if you would recommend rescaling these variables for this analysis. If appropriate, rescale the variables. Fit the model and obtain the parameter estimates for each of the explanatory variables. Explain the meaning of the parameter estimates for each of these explanatory variables, according to whether you have recommended rescaling or not. (You do not need to report other details of the analysis.)

(h) A colleague is also working with the same data file, and says: “This is great! The sample size is so big, everything is really, really significant; this whole study gives so many meaningful findings.” Respond to this comment.

 

 

5

SRW MAST90007 2021 Major assignment

Question 2 – Serum cholesterol at baseline [8 marks]

Serum cholesterol (mg/100ml) at baseline (column 10 in the datafile) is defined as serum cholesterol at examination 1 (the true baseline), or, when missing at examination 1, the serum cholesterol at the second examination. For many people in the study, serum cholesterol at both examinations 1 and 2 was available.

(a) Produce an appropriate graph showing the relationship between Serum cholesterol (mg/100ml) examination 1 and Serum cholesterol (mg/100ml) examination 2.

(b) Describe the relationship between the two variables, and give a suitable summary statistic.

(c) Fit a linear regression predicting Serum cholesterol (mg/100ml) examination 1 from Serum cholesterol (mg/100ml) examination 2. Provide an appropriate summary table and give a plain language explanation of the estimates of the parameters of the model.

(d) Find a 95% prediction interval for Serum cholesterol (mg/100ml) examination 1 when Serum cholesterol (mg/100ml) examination 2 is 300 (mg/100ml). Explain its meaning.

(e) A colleague asks if using the Serum cholesterol (mg/100ml) examination 2 value itself as the estimate of Serum cholesterol (mg/100ml) examination 1 is a good idea; for example, if Serum cholesterol (mg/100ml) examination 2 = 275, predict that Serum cholesterol (mg/100ml) examination 1 = 275. (This is, in fact, what was done.) Does this under-estimate, or over-estimate Serum cholesterol (mg/100ml) examination 1, using the data available? Provide a graph that will help answer this question. (Hint: Consider adding a Calculated line to show y = x.) Provide an explanation in writing.

(f) Consider improving the prediction of Serum cholesterol (mg/100ml) examination 1. Explain, in principle, a possible approach. You do not need to implement the approach.

(g) A key research question is about the relationship of smoking status at baseline and sex to Serum cholesterol (mg/100ml) baseline (column 10). Describe a suitable statistical model for answering this question, and explain the effects that will be considered in the model.

(h) Use Minitab to fit the model that you have specified in part (g). Provide a summary table of the Analysis of variance, and give a plain language explanation of the meaning of the P-values associated with each of the explanatory variables. Use concrete terms in relation to the Framingham study, rather than in abstract form.

(i) State one assumption required for analysing the data using the model you have suggested. State if the assumption is reasonable and provide relevant evidence.

(j) Provide an appropriate graphical display to summarise the findings in relation to the model you have fitted in (h).

 

6

SRW MAST90007 2021 Major assignment

(k) Find 95% confidence intervals for the effects of sex and smoking status on serum cholesterol at baseline; use Fisher intervals and provide those that best describe the results. Provide a suitable report of these confidence intervals, including a plain language explanation in concrete terms.

 

Question 3 – Survival at last examination [6 marks]

Consider Survived at last examination; this is in column 15.

(a) Produce a graph of the data that allows a comparison of Survived at last examination in terms of sex.

(b) Comment on any differences for sex, based on the graph.

(c) Estimate the difference in proportions (for sex) surviving at the last examination, and the 95% confidence for this difference. Write a plain language explanation of the results, using concrete terms in relation to the Framingham study.

(d) Carry out a logistic regression analysis of “Survived at last examination” using sex as a

predictor. Write a summary of the results, again suitable for a doctor interested in the findings.

(e) Subset the Minitab worksheet to exclude those who have survived at examination 16,

so that you have the subset of subjects who died prior to examination 16.

Explore the relationship between cause of death and sex, using a suitable graphical display. You may consider combining causes of death, if you think this is appropriate. (Hint: Data > Recode). Provide a suitable graph with a brief written description of the patterns in the graph.

(f) A colleague wants to consider predicting Survived at last examination from Serum

cholesterol (mg/100ml) examination 1 (column 8). She notes that some of the values are missing. Your colleague suggests says “I don’t think we need to worry about that as there will still be plenty of data to carry out an analysis”. Provide a response to this, explaining any assumptions involved, and include a summary table to describe the amount of missing data for Serum cholesterol (mg/100ml) examination 1.

(g) At the time that the Framingham study, diastolic blood pressure was believed to be a

superior measure of blood pressure compared with systolic blood pressure. High levels of systolic blood pressure were not believed to be important in terms of health outcomes. Examine the relationship between these two measures of blood pressure at baseline visually. Provide a plot that represents this relationship.

Consider the summary tables providing the results of three logistic regression models predicting Survived at last examination.

 

7

SRW MAST90007 2021 Major assignment

 

Model Explanatory variable(s)

Odds ratio 95% confidence interval

for Odds ratio P-value

1 Systolic blood pressure/10 1.34 1.31, 1.38 < 0.001 2 Diastolic blood pressure/10 1.53 1.46, 1.60 < 0.001 3 Systolic blood pressure/10 1.32 1.27, 1.38 < 0.001 Diastolic blood pressure/10 1.04 0.96, 1.12 0.341

 

Based on these analyses and your examination of the explanatory variables, comment on the belief about the “superior” blood pressure measure in predicting survival at the last examination. Formal analyses are not required to answer this question.

 

 

8

SRW MAST90007 2021 Major assignment

Advice

Here is some advice to follow when preparing your assignment.

• The purpose of the assignment is to relate the statistical theory and practice learned in Statistics for Research Workers to real world data. The essential feature is that you must demonstrate understanding and application of statistical ideas covered in SRW to real world practice.

• The presentation of results should be consistent with the principles for presenting graphics and tables discussed in the course.

• In general, you are not required to provide Minitab output in the assignment, with the exception of graphs.

• The word limit for the assignment is 1,500 words. From our point of view, this is an upper limit for the assignment and you should aim to submit between 1,400 and 1,500 words. The word count does not include graphs and tables. University policy allows for a 10% deduction of marks once a written assignment exceeds 10% of the specified word limit. As the 1,500-word assignment is worth 20% of your final mark, you could lose 2% from your final mark if your assignment was, for example, 1,670 words.

• Your answers should be on no more than twelve (12) A4 pages of standard sized writing. This includes any graphs. Twelve pages is a generous limit for the assignment; this document is on eight pages, with a lot of white space, and it contains around 2,000 words.

• You do not need to reproduce the questions in your assignment.

 

 

  • MAST90007: Statistics for Research Workers 2021
  • 1,500 word assignment

Get a 10 % discount on an order above $ 50
Use the following coupon code :
NursesHomework
Open chat
1
Hello, how may I be of help?
Hello, how may I be of help?