Worksheet 01: Writing Functions

Worksheet
Published

January 27, 2023

Instructions:

In a new .rmd or .qmd file, answer the questions below. Please use markdown headers (##) to clearly denote questions. See this template file as an example of formatting. Compile (knit/render) this file to PDF, visually confirm all your output and code can be seen and is readable, then submit to Canvas by the due date.

  1. Often we are interested in estimating the center of a distribution. The sample mean is sensitive to outliers and therefore we sometimes use other measures of center to estimate the center of a distribution. One such measure is the midhinge which is the average of \(Q_{1}\) and \(Q_{3}\). The quartiles break up the data into four equal quarters. \(Q_{1}\) is the 25th percentile and \(Q_{3}\) is the 75th percentile.

    1. Create a function that computes the midhinge.
    2. Use your function to compute the midhinge of the numbers: 3,100,40,7,29,2,230,44,100,1200,8,15,900.
The function \(quantile(x, probs=.25)\) will return the 25th percentile. See ?quantile for more details.

The function \(rpois(n,\lambda)\) will generate \(n\) values from a Poison distribution with parameter \(\lambda\). See ?rpois for more details.
  1. Suppose I wanted to use the midhinge to estimate the mean of a Poisson distribution, denoted as \(\lambda\). Use your function to calculate the midhinge of a sample of 500 values. Is the midhinge a good estimate of \(\lambda\)? Explain your reasoning.

  2. The unknown parameters in simple linear regression are the slope, \(\beta_{1}\), and the intercept, \(\beta_{0}\). The common formula for these estimates are given below.

\[ \hat{\beta}_{1}=\frac{\sum^{n}_{i=1}\left(x_{i}-\bar{x}\right)y_{i}}{\sum^{n}_{i=1}\left(x_{i}-\bar{x}\right)^{2}} \qquad \hat{\beta}_{0}=\bar{y}-\hat{\beta}_{1}\bar{x} \]

There are a few ways to do this but using the list() command will do this (i.e. your last line would look something like list(b_0, b_1))
  1. Write a function in R that computes both the intercept and the slope estimate. Your function will need to return two values.
  2. Test your code on the following situation. Use your function and compare your estimates to the results when using lm(). In Denali National Park, Alaska, the wolf population is dependent on a large, strong caribou population. In this wild setting, caribou are found in very large herds. It is thought that wolves keep caribou herds strong by helping prevent over-population. Let x represent the number of fall caribou herds and y represent the late winter wolf population in the park. A random sample of recent years gives the following results:
  • x = 31, 34, 27, 25, 17, 23, 20
  • y = 75, 85, 75, 60, 48, 60, 60