Below are three questions I want to explore based on a data set that contains the following variables, recorded monthly, from 1954 to 2016:

## How have interest rates evolved over the last few decades?

## How many times was the effective federal funds target rate above the rate of inflation?

## Are real GDP changes, unemployment rates, and inflation rates variables that help predict the variation in the inflation rate?

#These are the variables:

> str(interest_rates)

Classes ‘tbl_df’, ‘tbl’ and ‘data.frame’: 904 obs. of 10 variables:

** $ Year** : int 1954 1954 1954 1954 1954 1954 1955 1955 1955 1955 …

** $ Month**** ** : chr “07” “08” “09” “10” …

** $ Day**** ** : chr “01” “01” “01” “01” …

** ****$ Federal Funds Target Rate** : num NA NA NA NA NA NA NA NA NA NA …

** $ Federal Funds Upper Target **: num NA NA NA NA NA NA NA NA NA NA …

** $ Federal Funds Lower Target**** ** : num NA NA NA NA NA NA NA NA NA NA …

** ****$ Effective Federal Funds Rate**: num 0.8 1.22 1.06 0.85 0.83 1.28 1.39 1.29 1.35 1.43 …

** ****$ Real GDP (Percent Change)** : num 4.6 NA NA 8 NA NA 11.9 NA NA 6.7 …

** $ Unemployment Rate**** ** : num 5.8 6 6.1 5.7 5.3 5 4.9 4.7 4.6 4.7 …

$ Inflation Rate : num NA NA NA NA NA NA NA NA NA NA …

1.) How have interest rates evolved over the last few decades?

#We should first begin with a general time series plot with the year on the x-axis and the effective federal funds rate on the y-axis.

plot(`Effective Federal Funds Rate` ~ `Year`, data = interest_rates, xlab =”Rates per Year”, ylab = “Effective Federal Funds Rate”, main = “Rates over Time”, cex = 2, col = “blue”

#Although we notice the general rise and fall of this lending interest rate, how about the variability of the effective federal funds rate within a single year? There seem to be monthly observations that change during some years and remain constant for others. The 1980s look very different than the first part of the 2010s.

model.Year <- lm(`Effective Federal Funds Rate` ~ `Year`, data = interest_rates)

plot(model.Year, which =4)

We can use the Cook’s distance value to see the combined effect of each observation’s leverage and residual values. To calculate the Cook’s distance, the *i*th data point is removed from the model and the regression is recalculated. The Cook’s distance summarizes how much all of the other values in the regression model changed when the *i*th observation was removed.

#The middle observation counts (1980s) are associated with relatively large Cook’s distance values. Normally the Cook’s distances are also represented by clearly separated vertical lines. Since we have so many observations, this visual effect is lost.

>#Are there certain months where the effective rate changes?

boxplot(`Effective Federal Funds Rate` ~ Month, data = interest_rates, tck = 0.02, xlab = “Month”, ylab=”Effective Federal Funds Rate”, main = “Does the month matter?”, col = c(“darkgreen”, “orangered”))

#The short answer is, no, the months do not matter. The medians are pretty similar, probably just around 4.5 to 5%… and statistically we can prove this, even though the above graph should suffice as an explanation.

> model.Month <- lm(`Effective Federal Funds Rate` ~ Month, data = interest_rates)

> summary(model.Month)

Call:

lm(formula = `Effective Federal Funds Rate` ~ Month, data = interest_rates)

Residuals:

Min 1Q Median 3Q Max

-4.914 -2.450 -0.199 1.715 14.261

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 4.81905 0.45829 10.515 <2e-16 ***

Month02 -0.05032 0.64812 -0.078 0.938

Month03 0.04950 0.65073 0.076 0.939

Month04 0.11886 0.65073 0.183 0.855

Month05 0.12918 0.65073 0.199 0.843

Month06 0.18482 0.65073 0.284 0.776

Month07 0.12841 0.64812 0.198 0.843

Month08 0.15413 0.64812 0.238 0.812

Month09 0.15095 0.64812 0.233 0.816

Month10 0.11063 0.64812 0.171 0.865

Month11 0.06810 0.64812 0.105 0.916

Month12 0.06095 0.64812 0.094 0.925

—

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 3.638 on 740 degrees of freedom

(152 observations deleted due to missingness)

Multiple R-squared: 0.0003318, Adjusted R-squared: -0.01453

F-statistic: 0.02233 on 11 and 740 DF, p-value: 1

#None of the p-values associated with the single t-tests are significantly different than zero. Just in case the 2010s proved unusually steady while other years experienced more oscillations between months, all months associated with years 2010 through 2016 were eliminated from the model and a new regression was made.

> model.Month2 <- lm(`Effective Federal Funds Rate` ~ Month, data = subset(interest_rates, Year <2010))

> summary(model.Month2)

Call:

lm(formula = `Effective Federal Funds Rate` ~ Month, data = subset(interest_rates,

Year < 2010))

Residuals:

Min 1Q Median 3Q Max

-5.4213 -2.4667 -0.3437 1.5644 13.5904

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 5.48964 0.46005 11.933 <2e-16 ***

Month02 -0.05927 0.65061 -0.091 0.927

Month03 -0.02182 0.65061 -0.034 0.973

Month04 0.05545 0.65061 0.085 0.932

Month05 0.06764 0.65061 0.104 0.917

Month06 0.13055 0.65061 0.201 0.841

Month07 0.05644 0.64770 0.087 0.931

Month08 0.08501 0.64770 0.131 0.896

Month09 0.08161 0.64770 0.126 0.900

Month10 0.03626 0.64770 0.056 0.955

Month11 -0.01178 0.64770 -0.018 0.985

Month12 -0.02464 0.64770 -0.038 0.970

—

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 3.412 on 654 degrees of freedom

(148 observations deleted due to missingness)

Multiple R-squared: 0.0002527, Adjusted R-squared: -0.01656

F-statistic: 0.01503 on 11 and 654 DF, p-value: 1

#Nothing changes by excluding the months in these seven years. So let’s create a model object for years regressed on the effective federal funds rate. We come to a different conclusion for the year predictor variable.

> model.Year <- lm(`Effective Federal Funds Rate` ~ Year, data = interest_rates)

> summary(model.Year)

Call:

lm(formula = `Effective Federal Funds Rate` ~ Year, data = interest_rates)

Residuals:

Min 1Q Median 3Q Max

-5.7058 -2.9800 -0.4617 1.7083 13.9685

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 105.963938 13.982013 7.579 1.03e-13 ***

Year -0.050900 0.007042 -7.228 1.21e-12 ***

—

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 3.494 on 750 degrees of freedom

(152 observations deleted due to missingness)

Multiple R-squared: 0.06512, Adjusted R-squared: 0.06387

F-statistic: 52.24 on 1 and 750 DF, p-value: 1.212e-12

2.) How many times is the effective federal funds target rate above the rate of inflation?

> occurrences <- interest_rates$`Effective Federal Funds Rate` > interest_rates$`Inflation Rate`

> summary(occurrences) Mode FALSE TRUE NA's logical 226 484 194

#The FALSE and TRUE observation counts indicate how many times the effective federal funds target rate was larger than the inflation rate. Whenever data is missing for either the effective federal funds target rate **or** the inflation rate, a NA will be produced for occurrences. Therefore we only look at observations with values for both the effective federal funds target and inflation rate.

247+68

[1] 315

#When False

68/315

#When True

247/315

#More often than not, the effective federal funds target rate is higher than the rate of inflation.

#The effective federal funds rate is the interest rate depository institutions charge one another when they lend one another funds.

#Here is more information: https://fred.stlouisfed.org/series/FEDFUNDS

#It makes sense that banking institutions would want to earn interest on the funds loaned.

3.) Are real GDP changes, unemployment rates, and inflation rates good variables for predicting the average variation in the inflation rate?

#We can run three single linear regressions and assign them to a model object. This is done three times below.

> model.GDP <- lm(`Effective Federal Funds Rate` ~ `Real GDP (Percent Change)`, data = interest_rates)

> summary(model.GDP)

Call:

lm(formula = `Effective Federal Funds Rate` ~ `Real GDP (Percent Change)`,

data = interest_rates)

Residuals:

Min 1Q Median 3Q Max

-5.6613 -2.5303 -0.1314 1.6837 14.7109

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 5.25102 0.30661 17.126 <2e-16 ***

`Real GDP (Percent Change)` -0.10375 0.06429 -1.614 ** 0.108 **

—

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 3.651 on 248 degrees of freedom

(654 observations deleted due to missingness)

Multiple R-squared: 0.01039, Adjusted R-squared: 0.006402

F-statistic: 2.604 on 1 and 248 DF, p-value: 0.1078

> model.Unemployment <- lm(`Effective Federal Funds Rate` ~ `Unemployment Rate`, data = interest_rates)

> summary(model.Unemployment)

Call:

lm(formula = `Effective Federal Funds Rate` ~ `Unemployment Rate`,

data = interest_rates)

Residuals:

Min 1Q Median 3Q Max

-5.1303 -2.4760 -0.1807 1.7769 14.0607

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 4.40650 0.51960 8.481 <2e-16 ***

`Unemployment Rate` 0.08438 0.08406 1.004 ** 0.316 **

—

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 3.611 on 750 degrees of freedom

(152 observations deleted due to missingness)

Multiple R-squared: 0.001341, Adjusted R-squared: 9.914e-06

F-statistic: 1.007 on 1 and 750 DF, p-value: 0.3158

> model.Inflation.Rate <- lm(`Effective Federal Funds Rate` ~ `Inflation Rate`, data = interest_rates)

> summary(model.Inflation.Rate)

Call:

lm(formula = `Effective Federal Funds Rate` ~ `Inflation Rate`,

data = interest_rates)

Residuals:

Min 1Q Median 3Q Max

-8.0637 -1.6861 0.1715 1.5918 7.7240

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 0.9058 0.1501 6.036 2.55e-09 ***

`Inflation Rate` 1.1139 0.0331 33.647 **< 2e-16 *****

—

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 2.269 on 708 degrees of freedom

(194 observations deleted due to missingness)

Multiple R-squared: 0.6152, Adjusted R-squared: 0.6147

F-statistic: 1132 on 1 and 708 DF, p-value: < 2.2e-16

#The slope coefficient of the predictor variable in the second model is not significant at any reasonable level of significance. In order words, the slope of the employment rate is not significantly different than zero.

#The employment rate is not a good predictor variable of the effective federal funds rate. It should be noted that we do not know if the unemployment rate is record in U-3 or U-6 percentages. It is most likely that these percentages are in U-3 since our first observations are in 1954. The U-6 unemployment rates first began to be calculated in the early 1990s.

#The unemployment rate does not have a lower bound of 0%. Frictional unemployment exists in the best economic conditions.

#The real GDP (percent change) is marginally significant if we were to set alpha at 0.10. This is a questionably large confidence interval to accept. If we were to build a best subsets model with multiple predictor variables, the alpha to enter and exit could be set higher, possibly 0.15. We can revisit this later.

#The inflation rate is significant at any reasonable level of significance. On average, the effective federal funds rate will be 1.1139 times the inflation rate plus 0.9058. The inflation rate explains 61.52% of the variation in the effective federal funds rate.

#We should see if the real GDP in percentage change terms belongs as a second predictor variable in a model that contains inflation rate as the first predictor variable. A new linear regression is assigned a new model object and the summary function is performed on that new model object.

> model.MLR <- lm(`Effective Federal Funds Rate` ~ `Inflation Rate` + `Real GDP (Percent Change)`, data = interest_rates)

> summary(model.MLR)

Call:

lm(formula = `Effective Federal Funds Rate` ~ `Inflation Rate` +

`Real GDP (Percent Change)`, data = interest_rates)

Residuals:

Min 1Q Median 3Q Max

-8.2808 -1.6203 0.1622 1.5577 6.2258

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 0.61688 0.31189 1.978 0.0491 *

`Inflation Rate` 1.14912 0.05840 19.678 <2e-16 ***

`Real GDP (Percent Change)` 0.05447 0.04223 1.290 0.1983

—

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 2.27 on 233 degrees of freedom

(668 observations deleted due to missingness)

Multiple R-squared: 0.6275, Adjusted R-squared: 0.6243

F-statistic: 196.3 on 2 and 233 DF, p-value: < 2.2e-16

#The adjusted R-squared value does not improve much by adding a second predictor variable. The p-value for the individual t-test for the slope of the real GDP (percent change) coefficient is also not significantly different than zero at any reasonable level of significance.

#Before settling for the single linear regression model that only contains the inflation rate, let us check the diagnostic plots for the residuals in the model.

> op <- par(mfrow=c(2,2))

> plot(model.Inflation.Rate)

#Just from looking at the first row of plots, we see that the density and variance of the residuals looks good for the first half of the fitted values, but the variance increases and density decreases as fitted values increase. There is no pattern. The normality plot also looked decent, but does have an issue in the lower tail. In the future we may want to consider a transformation or debate whether the model fits well for a subset of inflation rate values.