Interest Rates and Effective Federal Funds Rate


As I learn SAS, I would still like to retain a working knowledge of R.  Unlike SAS and Minitab, R is a free software.
As many comparison sites indicate, there is a steeper learning curve for R.  Initially you can become frustrated by making small typos that result in error messages.  I had to overcome initial aversion to finding and installing the right packages.  There are no automatic updates, so you need to discern the cause of mistakes, be it due to format, typos, or outdated programs.  An organized project folder also saves time later.
With time limitations, I will try to “explore” data.  Data mining can be much more time consuming than the actual analysis.  For this reason, the scoop of these explorations will be limited as this blog is kept as a hobby.

Below are three questions I want to explore based on a data set that contains the following variables, recorded monthly, from 1954 to 2016:

 

## How have interest rates evolved over the last few decades? 

 

## How many times was the effective federal funds target rate above the rate of inflation?

 

## Are real GDP changes, unemployment rates, and inflation rates variables that help predict the variation in the inflation rate?

#These are the variables:

> str(interest_rates)

Classes ‘tbl_df’, ‘tbl’ and ‘data.frame’:    904 obs. of  10 variables:

 $ Year                        : int  1954 1954 1954 1954 1954 1954 1955 1955 1955 1955 …

 $ Month                       : chr  “07” “08” “09” “10” …

 $ Day                         : chr  “01” “01” “01” “01” …

 $ Federal Funds Target Rate   : num  NA NA NA NA NA NA NA NA NA NA …

 $ Federal Funds Upper Target  : num  NA NA NA NA NA NA NA NA NA NA …

 $ Federal Funds Lower Target  : num  NA NA NA NA NA NA NA NA NA NA …

 $ Effective Federal Funds Rate: num  0.8 1.22 1.06 0.85 0.83 1.28 1.39 1.29 1.35 1.43 …

 $ Real GDP (Percent Change)   : num  4.6 NA NA 8 NA NA 11.9 NA NA 6.7 …

 $ Unemployment Rate           : num  5.8 6 6.1 5.7 5.3 5 4.9 4.7 4.6 4.7 …

 $ Inflation Rate              : num  NA NA NA NA NA NA NA NA NA NA …

 

 


 

1.) How have interest rates evolved over the last few decades?

#We should first begin with a general time series plot with the year on the x-axis and the effective federal funds rate on the y-axis.

plot(`Effective Federal Funds Rate` ~ `Year`, data = interest_rates, xlab =”Rates per Year”, ylab = “Effective Federal Funds Rate”, main = “Rates over Time”, cex = 2, col = “blue”

lending rate per year#Although we notice the general rise and fall of this lending interest rate, how about the variability of the effective federal funds rate within a single year?  There seem to be monthly observations that change during some years and remain constant for others.  The 1980s look very different than the first part of the 2010s.

model.Year <- lm(`Effective Federal Funds Rate` ~ `Year`, data = interest_rates)

plot(model.Year, which =4)

We can use the Cook’s distance value to see the combined effect of each observation’s leverage and residual values.  To calculate the Cook’s distance, the ith data point is removed from the model and the regression is recalculated.  The Cook’s distance summarizes how much all of the other values in the regression model changed when the ith observation was removed.

Cook's distance values

#The middle observation counts (1980s) are associated with relatively large Cook’s distance values.  Normally the Cook’s distances are also represented by clearly separated vertical lines.  Since we have so many observations, this visual effect is lost.

 >#Are there certain months where the effective rate changes? 

boxplot(`Effective Federal Funds Rate` ~ Month, data = interest_rates, tck = 0.02, xlab = “Month”, ylab=”Effective Federal Funds Rate”, main = “Does the month matter?”, col = c(“darkgreen”, “orangered”))

Does the month matter.png

#The short answer is, no, the months do not matter. The medians are pretty similar, probably just around 4.5 to 5%… and statistically we can prove this, even though the above graph should suffice as an explanation.

> model.Month <- lm(`Effective Federal Funds Rate` ~ Month, data = interest_rates)

> summary(model.Month)

Call:

lm(formula = `Effective Federal Funds Rate` ~ Month, data = interest_rates)

Residuals:

   Min     1Q Median     3Q    Max

-4.914 -2.450 -0.199  1.715 14.261

Coefficients:

            Estimate Std. Error t value Pr(>|t|)   

(Intercept)  4.81905    0.45829  10.515   <2e-16 ***

Month02     -0.05032    0.64812  -0.078    0.938   

Month03      0.04950    0.65073   0.076    0.939   

Month04      0.11886    0.65073   0.183    0.855   

Month05      0.12918    0.65073   0.199    0.843   

Month06      0.18482    0.65073   0.284    0.776   

Month07      0.12841    0.64812   0.198    0.843   

Month08      0.15413    0.64812   0.238    0.812   

Month09      0.15095    0.64812   0.233    0.816   

Month10      0.11063    0.64812   0.171    0.865   

Month11      0.06810    0.64812   0.105    0.916   

Month12      0.06095    0.64812   0.094    0.925   

Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 3.638 on 740 degrees of freedom

  (152 observations deleted due to missingness)

Multiple R-squared:  0.0003318,     Adjusted R-squared:  -0.01453

F-statistic: 0.02233 on 11 and 740 DF,  p-value: 1

 

#None of the p-values associated with the single t-tests are significantly different than zero.  Just in case the 2010s proved unusually steady while other years experienced more oscillations between months, all months associated with years 2010 through 2016 were eliminated from the model and a new regression was made.

> model.Month2 <- lm(`Effective Federal Funds Rate` ~ Month, data = subset(interest_rates, Year <2010))

> summary(model.Month2)

 Call:

lm(formula = `Effective Federal Funds Rate` ~ Month, data = subset(interest_rates,

    Year < 2010))

Residuals:

    Min      1Q  Median      3Q     Max

-5.4213 -2.4667 -0.3437  1.5644 13.5904

Coefficients:

            Estimate Std. Error t value Pr(>|t|)   

(Intercept)  5.48964    0.46005  11.933   <2e-16 ***

Month02     -0.05927    0.65061  -0.091    0.927   

Month03     -0.02182    0.65061  -0.034    0.973   

Month04      0.05545    0.65061   0.085    0.932   

Month05      0.06764    0.65061   0.104    0.917   

Month06      0.13055    0.65061   0.201    0.841   

Month07      0.05644    0.64770   0.087    0.931   

Month08      0.08501    0.64770   0.131    0.896   

Month09      0.08161    0.64770   0.126    0.900   

Month10      0.03626    0.64770   0.056    0.955   

Month11     -0.01178    0.64770  -0.018    0.985   

Month12     -0.02464    0.64770  -0.038    0.970   

Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

 

Residual standard error: 3.412 on 654 degrees of freedom

  (148 observations deleted due to missingness)

Multiple R-squared:  0.0002527,     Adjusted R-squared:  -0.01656

F-statistic: 0.01503 on 11 and 654 DF,  p-value: 1

 

#Nothing changes by excluding the months in these seven years.  So let’s create a model object for years regressed on the effective federal funds rate.  We come to a different conclusion for the year predictor variable.

> model.Year <- lm(`Effective Federal Funds Rate` ~ Year, data = interest_rates)
> summary(model.Year)

Call:

lm(formula = `Effective Federal Funds Rate` ~ Year, data = interest_rates)

 

Residuals:

    Min      1Q  Median      3Q     Max

-5.7058 -2.9800 -0.4617  1.7083 13.9685

Coefficients:

              Estimate Std. Error t value Pr(>|t|)   

(Intercept) 105.963938  13.982013   7.579 1.03e-13 ***

Year         -0.050900   0.007042  -7.228 1.21e-12 ***

Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 3.494 on 750 degrees of freedom

  (152 observations deleted due to missingness)

Multiple R-squared:  0.06512, Adjusted R-squared:  0.06387

F-statistic: 52.24 on 1 and 750 DF,  p-value: 1.212e-12

 


 

2.) How many times is the effective federal funds target rate above the rate of inflation?

> occurrences <- interest_rates$`Effective Federal Funds Rate` > interest_rates$`Inflation Rate`

> summary(occurrences)
   Mode   FALSE    TRUE    NA's 
logical     226     484     194

 

#The FALSE and TRUE observation counts indicate how many times the effective federal funds target rate was larger than the inflation rate.  Whenever data is missing for either the effective federal funds target rate or the inflation rate, a NA will be produced for occurrences.  Therefore we only look at observations with values for both the effective federal funds target and inflation rate.

 247+68

[1] 315

 #When False

 68/315

 

 #When True

 247/315

 

#More often than not, the effective federal funds target rate is higher than the rate of inflation.

#The effective federal funds rate is the interest rate depository institutions charge one another when they lend one another funds.

#Here is more information: https://fred.stlouisfed.org/series/FEDFUNDS

#It makes sense that banking institutions would want to earn interest on the funds loaned.


 

3.) Are real GDP changes, unemployment rates, and inflation rates good variables for predicting the average variation in the inflation rate?

#We can run three single linear regressions and assign them to a model object.  This is done three times below.

> model.GDP <- lm(`Effective Federal Funds Rate` ~ `Real GDP (Percent Change)`, data = interest_rates)

> summary(model.GDP)

Call:

lm(formula = `Effective Federal Funds Rate` ~ `Real GDP (Percent Change)`,

    data = interest_rates)

Residuals:

    Min      1Q  Median      3Q     Max

-5.6613 -2.5303 -0.1314  1.6837 14.7109

Coefficients:

                            Estimate Std. Error t value Pr(>|t|)   

(Intercept)                  5.25102    0.30661  17.126   <2e-16 ***

`Real GDP (Percent Change)` -0.10375    0.06429  -1.614    0.108   

Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

 

Residual standard error: 3.651 on 248 degrees of freedom

  (654 observations deleted due to missingness)

Multiple R-squared:  0.01039, Adjusted R-squared:  0.006402

F-statistic: 2.604 on 1 and 248 DF,  p-value: 0.1078

 

> model.Unemployment <- lm(`Effective Federal Funds Rate` ~ `Unemployment Rate`, data = interest_rates)

> summary(model.Unemployment)

 

Call:

lm(formula = `Effective Federal Funds Rate` ~ `Unemployment Rate`,

    data = interest_rates)

Residuals:

    Min      1Q  Median      3Q     Max

-5.1303 -2.4760 -0.1807  1.7769 14.0607

Coefficients:

                    Estimate Std. Error t value Pr(>|t|)   

(Intercept)          4.40650    0.51960   8.481   <2e-16 ***

`Unemployment Rate`  0.08438    0.08406   1.004    0.316   

Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 3.611 on 750 degrees of freedom

  (152 observations deleted due to missingness)

Multiple R-squared:  0.001341,      Adjusted R-squared:  9.914e-06

F-statistic: 1.007 on 1 and 750 DF,  p-value: 0.3158

 

> model.Inflation.Rate <- lm(`Effective Federal Funds Rate` ~ `Inflation Rate`, data = interest_rates)

> summary(model.Inflation.Rate)

 

Call:

lm(formula = `Effective Federal Funds Rate` ~ `Inflation Rate`,

    data = interest_rates)

Residuals:

    Min      1Q  Median      3Q     Max

-8.0637 -1.6861  0.1715  1.5918  7.7240

Coefficients:

                 Estimate Std. Error t value Pr(>|t|)   

(Intercept)        0.9058     0.1501   6.036 2.55e-09 ***

`Inflation Rate`   1.1139     0.0331  33.647  < 2e-16 ***

— 

Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 2.269 on 708 degrees of freedom

  (194 observations deleted due to missingness)

Multiple R-squared:  0.6152,  Adjusted R-squared:  0.6147

F-statistic:  1132 on 1 and 708 DF,  p-value: < 2.2e-16

 

#The slope coefficient of the predictor variable in the second model is not significant at any reasonable level of significance.  In order words, the slope of the employment rate is not significantly different than zero. 

#The employment rate is not a good predictor variable of the effective federal funds rate.  It should be noted that we do not know if the unemployment rate is record in U-3 or U-6 percentages.  It  is most likely that these percentages are in U-3 since our first observations are in 1954.  The U-6 unemployment rates first began to be calculated in the early 1990s. 

#The unemployment rate does not have a lower bound of 0%.  Frictional unemployment exists in the best economic conditions.

#The real GDP (percent change) is marginally significant if we were to set alpha at 0.10.  This is a questionably large confidence interval to accept.  If we were to build a best subsets model with multiple predictor variables, the alpha to enter and exit could be set higher, possibly 0.15.  We can revisit this later.

#The inflation rate is significant at any reasonable level of significance.  On average, the effective federal funds rate will be 1.1139 times the inflation rate plus 0.9058.  The inflation rate explains 61.52% of the variation in the effective federal funds rate.

#We should see if the real GDP in percentage change terms belongs as a second predictor variable in a model that contains inflation rate as the first predictor variable.  A new linear regression is assigned a new model object and the summary function is performed on that new model object.

> model.MLR <- lm(`Effective Federal Funds Rate` ~ `Inflation Rate` + `Real GDP (Percent Change)`, data = interest_rates)

> summary(model.MLR)

Call:

lm(formula = `Effective Federal Funds Rate` ~ `Inflation Rate` +

    `Real GDP (Percent Change)`, data = interest_rates)

Residuals:

    Min      1Q  Median      3Q     Max

-8.2808 -1.6203  0.1622  1.5577  6.2258

 

Coefficients:

                            Estimate Std. Error t value Pr(>|t|)   

(Intercept)                  0.61688    0.31189   1.978   0.0491 * 

`Inflation Rate`             1.14912    0.05840  19.678   <2e-16 ***

`Real GDP (Percent Change)`  0.05447    0.04223   1.290   0.1983   

Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 2.27 on 233 degrees of freedom

  (668 observations deleted due to missingness)

Multiple R-squared:  0.6275,        Adjusted R-squared:  0.6243

F-statistic: 196.3 on 2 and 233 DF,  p-value: < 2.2e-16

#The adjusted R-squared value does not improve much by adding a second predictor variable.  The p-value for the individual t-test for the slope of the real GDP (percent change) coefficient is also not significantly different than zero at any reasonable level of significance.

#Before settling for the single linear regression model that only contains the inflation rate, let us check the diagnostic plots for the residuals in the model.

> op <- par(mfrow=c(2,2))

> plot(model.Inflation.Rate)

SLR diagonistic plots.jpg

#Just from looking at the first row of plots, we see that the density and variance of the residuals looks good for the first half of the fitted values, but the variance increases and density decreases as fitted values increase.  There is no pattern.  The normality plot also looked decent, but does have an issue in the lower tail.  In the future we may want to consider a transformation or debate whether the model fits well for a subset of inflation rate values.

The Electoral College and the Tidewater Nation


The author of American Nations: A History of the Eleven Rival Regional Cultures of North America, tries to show us why we should not view policy positions as simply “Democrat” or “Republican”.  According to Woodard, we live in a country of 11 nations that form coalitions based upon various issues.  The objective of each nation is to preserve their identity and to be influential in national politics.

 

the-american-nations-today

Woodard (c)2011

The author suggests that contrary to popular notion of the United States being a melting pot, new arrivals either specifically moved to one of the 11 nations because the nation encompassed their values or the newcomers were assimilated, adopting the pre-existing values of a nation.   In this second scenario, the original founders of a community set the framework for that nation and new arrivals conform to or otherwise reinforce that culture.

Colin Woodard also explains that different nations in the United States held different conceptions of democracy.  The Yankeedom nation held the Nordic or Germanic conception of democracy, which encouraged near universal male suffrage.  Yankeedom was founded primarily by middle-class, well-educated Puritans.  Immigrants came in family units and they valued community structure and shared values.  When migrants settled other parts of the United States, they carried these tendencies and traditions with them.  When confronting other nations, such as New Amsterdam, the Midlands, and Greater Appalachia, they sought to impose their Puritanism.

Other nations were founded by deep inequalities.  The Tidewater and Deep South treasured Greek or Roman democratic system, where the existence of slaves coincided with their perception of democracy.   The Greek of Roman democracy exists to benefit the few, allowing a select group of men to become “enlightened” and guide their societies.  This benefit is seen as outweighing the agonies of those enslaved.  They viewed slavery as more humane than the treatment of the urban poor in the northern nations.  They reasoned that at least the slaves had a master that was supposed to care for them.  “Enlightened” Tidewater and Deep South gentry also argued that Yankeedom was a society of shopkeepers, which prevented individuals from becoming educated enough to advance their societies.

The Tidewater and Deep South were also not founded by equal proportions of men and women and tended to support the Royalists back in the United Kingdom.  During the English Civil War, then tended to side with the King.  The Tidewater saw themselves as an extension of the Norman culture while Yankeedom was Anglo-Saxon.  Things changed for the Tidewater when the British Empire sought to homogenize control over their empire.  The King redefined the rights of his British subjects.  Only those living in England had full rights.  This  clarification of who was considered an Englishman did not go over well for the gentry of the Tidewater.

It should be interesting to note that other nations did not value the democratic system at all.  New Netherland (New York) preferred a hegemonic system and hoped to be reabsorbed by the Dutch or British monarchies on several occasions.   Autocracy worked given that citizens showed tolerance towards one another.

It should not be surprising which cultures would support the continued use of the Electoral College system.  The National Constitution Center features a podcast from December 1, 2016 titled “Should we abolish the Electoral College?”.  The two panelists have biographies included on the website.  From this limited information, we might conclude that the one panelist is from either Yankeedom or the Left Coast while the other is from the Tidewater.  Given that Woodward’s theory is correct, both natives and migrants become assimilated by their nations.  In turn, panelists eventually will advocate the ideals of their nations.

This perspective is interesting because “Yankeedom” or “the Left Coast” could be considered “Democrat” in this past election cycle.  They will be on the defensive when faced with the new administration.  The representative from the “Tidewater” may or may not be considered a “Democrat”, but they come from a dying nation.   The Tidewater nation may not exist in the future. The growth of the DC metropolitan area into Maryland and northern Virginia essentially divides this nation.  Incremental growth from the Midlands also reduces its power.  With rising sea levels, the region will also loose geography to the east.  Essentially, the representative from the Tidewater seeks to preserve any formerly established advantage at all costs.

Both panelists introduce us to the history of the Electoral College.  Some of the original founders envisioned the electors to choose the President and Vice President that were most qualified for the position.  Initially the most qualified person would become the President and the second-most qualified would be Vice President.  Electors were supposed to deliberate and select candidates to run for the final election.

The Electoral College was one of the last systems established during the Constitutional Convention.  The framers were concerned about the excesses of democracy and the emergence of demagogues, but showed “haste and fatigue” by the time they got around to the Electoral College.  Modern campaigns were also not envisioned.  Founding fathers thought the President should be determined based on their reputation and history of service, not by their cleverness, or radicalism, during a campaign.

According to Alex, the electoral college was supposed to serve as a nominating board to send candidates to the House of Representatives.  From this cohort we would end up with the best candidate.  However, by the 1820s, the responsibility for narrowing down the candidate list was being usurped from the Electoral College and handed over to the political parties.

During the 19th century a series of reforms were advocated.  Since several nations exist in the same state, district elections were advocated versus the “winner-take-all”. Some also wanted to eliminate human electors.  Andrew Jackson, an Appalachian, was one of these strong advocates for changing the system.

The moderator and President of the National Constitution Center reminds us that when elections are close, the Electoral College provides us with a clear winner.  A series of small differences in certain states are magnified by the electoral system.  In effect, there is “no room for doubt”.

The Tidewater representative suggests that the smaller states look favorably on keeping the Electoral College.  Its existence helps preserve the Federation; all consistences matter.  The Yankeedom or Left Coast representative refutes this idea, starting that two strong advocates for ditching the Electoral College in favor of a popular vote came from small states, Rhode Island and North Dakota.  Candidates do not campaign in these states under the Electoral College system and they probably still would not if we switched to a Popular Election system.

Both representatives do agree that a popular vote system would lead to increased role for the federal government, since national standards for registration and voting would need to be set and enforced.  The Tidewater representative shows deep concern over this possibility.

It is important to put this concern in its proper context.  As previously mentioned, the Tidewater nation is the only one today that is at risk of extinction.  During the expansion of the Deep South, the values of the Tidewater were eroded and made more extreme, especially its policy towards slavery.  Tidewater leaders eventually followed the lead of the Deep South.  The tobacco industry declined in the Tidewater just as the cotton industry became prosperous in the Deep South.  The Deep South was also able to expand west whereas the Tidewater was cut off by a new nation, Greater Appalachia.

american-nations-advancingWoodard (c)2011

The Yankeedom representative tells us that the conception of “democracy” has changed over time.  The Electoral College does not conform with people’s every day notions of democracy.  He uses our gubernatorial and student body elections as classic examples.  In these instances the popular vote installs the new leader.

This argument rests on the belief that all people in the 11 nations share this belief.  We might question if the Deep South uses wealth and race, or if Greater Appalachia uses strength, in place of popular elections as their preferred method for finding a new leader.

The panelists also discuss the geography of states.  The blue oases in red states do not count.  Woodard addresses this issue by analyzing nations at the county level.

They also discuss the implications of a Popular Vote system.  The Tidewater representative reminds us that having “run off elections” creates an entirely different system.  Other “fringe” political parties would have a stronger initiative to enter the contest.  These “fringe” parties would be able to form coalitions and run for a second round.  The Tidewater representative also warns us that with more than two political parties, there would be less of a “moderating” influence.  It is also uncertain if third parties would increase or reduce the emergence of demagogues.  Regardless of how many exist, political parties were not viewed favorably by most of the founding fathers.

Voting Characteristics in Philadelphia’s Second Ward


The following is from information about the 27 divisions in the 2nd ward of Philadelphia.  As we seek to understand the results of the general presidential election of 2016, it helps to look at the data and trends at the local level.  Whoever best understands what happens locally can help sway the overall results on the state level. 

This data exploration can be enhanced by looking at information from the other wards.  This is a map of all the wards in Philadelphia.  This is the map of the 2nd ward which is subject to this analysis.

02.26.17 Philadelphia 2nd ward.jpg

Below is a summary of the following information from the 27 divisions. 

The original variables is this dataset are:

  • Democrats
  • Republicans
  • Independents
  • Other Party
  • Total Population
  • White
  • Black
  • Hispanic
  • Other Race
  • Male
  • Female
  • Gender Unreported

Other proportion variables will be created when appropriate.

> summary(second[,2:13])
      Dem             Rep              Ind          Other Party      Total Pop.         White      
 Min.   :293.0   Min.   : 41.00   Min.   : 3.000   Min.   : 42.0   Min.   : 379.0   Min.   : 54.0  
 1st Qu.:475.5   1st Qu.: 67.50   1st Qu.: 4.000   1st Qu.: 81.0   1st Qu.: 648.5   1st Qu.:166.0  
 Median :547.0   Median : 80.00   Median : 9.000   Median : 93.0   Median : 711.0   Median :225.0  
 Mean   :552.3   Mean   : 88.96   Mean   : 9.444   Mean   :102.0   Mean   : 752.7   Mean   :211.7  
 3rd Qu.:630.0   3rd Qu.:100.50   3rd Qu.:13.000   3rd Qu.:118.5   3rd Qu.: 846.0   3rd Qu.:246.0  
 Max.   :852.0   Max.   :203.00   Max.   :20.000   Max.   :212.0   Max.   :1252.0   Max.   :405.0  
     Black           Hispanic       Other Race         Male           Female      Gender Unreported
 Min.   :  7.00   Min.   : 6.00   Min.   :10.00   Min.   :122.0   Min.   :148.0   Min.   :112.0    
 1st Qu.: 15.00   1st Qu.:10.00   1st Qu.:20.50   1st Qu.:218.5   1st Qu.:246.5   1st Qu.:163.0    
 Median : 38.00   Median :13.00   Median :24.00   Median :256.0   Median :284.0   Median :193.0    
 Mean   : 58.11   Mean   :13.52   Mean   :27.11   Mean   :261.9   Mean   :288.1   Mean   :205.4    
 3rd Qu.: 66.00   3rd Qu.:16.50   3rd Qu.:30.00   3rd Qu.:311.5   3rd Qu.:317.0   3rd Qu.:241.0    
 Max.   :209.00   Max.   :26.00   Max.   :56.00   Max.   :413.0   Max.   :487.0   Max.   :365.0 

 

Are there associations between some of these variables?


1.)    Does the proportion of female voters in a division help explain the variation in the gross amount of Democratic voters?

A proportion variable is created for the female population in each division.

> second$FemaleProp <- second$Female/second$`Total Pop.`

> summary(lm(second$Dem ~ second$FemaleProp))

Call:

lm(formula = second$Dem ~ second$FemaleProp)
Residuals:

     Min       1Q   Median       3Q      Max

-256.340  -66.267   -1.328   75.866  302.028

Coefficients:

                  Estimate Std. Error t value Pr(>|t|)

(Intercept)          711.4      418.8   1.699    0.102

second$FemaleProp   -415.0     1090.1  -0.381    0.707



Residual standard error: 136.8 on 25 degrees of freedom

Multiple R-squared:  0.005765, Adjusted R-squared:  -0.034

F-statistic: 0.1449 on 1 and 25 DF,  p-value: 0.7066

The percentage of females of each division’s total population does a very bad job explaining the variability in the number of democratic voters in each division.  The coefficient of determination is almost zero and the p-value is very large.

 It might be more appropriate to look at the number of Democrats in each division as a proportion rather than in total persons.

 I make another variable to represent this new proportion.

 

> second$DemProp <- second$Dem/second$`Total Pop.`

> summary(lm(second$DemProp ~ second$FemaleProp))

Call:

lm(formula = second$DemProp ~ second$FemaleProp)

Residuals:

      Min        1Q    Median        3Q       Max 

-0.075952 -0.019556  0.002954  0.029052  0.058959 

 

Coefficients:

                  Estimate Std. Error t value Pr(>|t|)    

(Intercept)         0.6205     0.1146   5.415 1.28e-05 ***

second$FemaleProp   0.3045     0.2983   1.021    0.317    

---

Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

 
Residual standard error: 0.03744 on 25 degrees of freedom

Multiple R-squared:  0.04003, Adjusted R-squared:  0.001632 

F-statistic: 1.043 on 1 and 25 DF,  p-value: 0.317

The p-value halves, but the coefficient of determination is still very low.  The association is not even close to being statistically significant. 


 


 

 

2.)    Is the male proportion of the population indicative of the total Republicans in the same division? 

A new variable is created to represent the male proportion of the population of each division.

> second$MaleProp <- second$Male/second$`Total Pop.`

> summary(lm(second$Rep ~ second$MaleProp))

Call:

lm(formula = second$Rep ~ second$MaleProp)

Residuals:

    Min      1Q  Median      3Q     Max

-54.447 -19.993  -9.245  14.653 108.770

Coefficients:

                Estimate Std. Error t value Pr(>|t|) 

(Intercept)       173.71      89.37   1.944   0.0633 .

second$MaleProp  -243.12     255.59  -0.951   0.3506 

---

Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 36.46 on 25 degrees of freedom

Multiple R-squared:  0.03493, Adjusted R-squared:  -0.003676

F-statistic: 0.9048 on 1 and 25 DF,  p-value: 0.3506

 

There is no statistically significant association between these the proportion of males and the amount of Republicans in a division for the same reasons stated in the first example.

Another new variable is created to represent the Republican proportion of the population of each division.

> second$RepProp <- second$Rep/second$`Total Pop.`

> summary(lm(second$RepProp ~ second$MaleProp))

Call:

lm(formula = second$RepProp ~ second$MaleProp)




Residuals:

      Min        1Q    Median        3Q       Max

-0.049654 -0.013996 -0.004151  0.016322  0.057392




Coefficients:

                Estimate Std. Error t value Pr(>|t|)  

(Intercept)      0.17521    0.06203   2.825  0.00916 **

second$MaleProp -0.16729    0.17740  -0.943  0.35472  

---

Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1



Residual standard error: 0.02531 on 25 degrees of freedom

Multiple R-squared:  0.03435, Adjusted R-squared:  -0.00428

F-statistic: 0.8892 on 1 and 25 DF,  p-value: 0.3547

 

This time the creation of a proportion versus total does almost nothing to change the coefficient of determination and p-value of the regressions.  There is no statistically significant association between the two variables at any reasonable level of significance.

However, we must also consider the quality of the data we have received.

 

Gender Unreported

Min.   :112.0    

1st Qu.:163.0    

Median :193.0    

Mean   :205.4    

3rd Qu.:241.0    

Max.   :365.0 

 

To put this in percentage terms of the total population in each division:

02-26-17-histogram-of-nogender

We do not know the gender of a sizeable percentage of voters in each division!  22% to 32% of each division does not have an identified gender.  We might have found associations between the variables in the first two examples if we had more complete data.


 

3.)    Do populations with a higher proportion of white voters help explain the variation in the amount of Independent party registrants? 

A proportion is created for the white voters over the total population in the division.

> second$WhiteProp <- second$White/second$`Total Pop.`

> summary(lm(second$Ind ~ second$WhiteProp))

Call:

lm(formula = second$Ind ~ second$WhiteProp)



Residuals:

   Min     1Q Median     3Q    Max

-7.465 -3.085  0.759  2.726  9.918



Coefficients:

                 Estimate Std. Error t value Pr(>|t|) 

(Intercept)         2.385      3.917   0.609   0.5480 

second$WhiteProp   25.531     13.746   1.857   0.0751 .

---

Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1



Residual standard error: 4.92 on 25 degrees of freedom

Multiple R-squared:  0.1213,  Adjusted R-squared:  0.08612

F-statistic:  3.45 on 1 and 25 DF,  p-value: 0.07507

We are much closer to finding an association between the proportion of white voters and their tendency to register as Independent versus our attempts to find associations between political parties and genders.  However, the white proportion is still not a statistically significant indicator of independent party registrants if we define alpha at 0.05.  The p-value is found as 0.07507 and the coefficient of determination is very low at 12.13%.

As we did before, we can now create a new object for the percentage of Independent party registrants from the total number of registrants per division.

> second$IndProp <- second$Ind/second$`Total Pop.`

> summary(lm(second$IndProp ~ second$WhiteProp))

Call:

lm(formula = second$IndProp ~ second$WhiteProp)




Residuals:

       Min         1Q     Median         3Q        Max

-0.0081644 -0.0033878 -0.0001117  0.0032048  0.0082518




Coefficients:

                 Estimate Std. Error t value Pr(>|t|) 

(Intercept)      0.004396   0.003759   1.170   0.2532 

second$WhiteProp 0.027473   0.013191   2.083   0.0477 *

---

Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1



Residual standard error: 0.004721 on 25 degrees of freedom

Multiple R-squared:  0.1479,  Adjusted R-squared:  0.1138

F-statistic: 4.338 on 1 and 25 DF,  p-value: 0.04766

 

As a proportion, we find the association to be significant, even though the R-squared value is only 14.79%.  We should now check the residual plots and consider the possibility of adding other variables to the model to improve the coefficient of determination.

op = par(mfrow = c(2,2))> plot(lm(IndProp ~ WhiteProp, data = second))

02-26-17-analysis-for-third-example

The residuals versus fits plot and the normal probability plot look good.  The errors are distributed normally with an approximate mean of zero and constant variance.


 

4.)    Does the total population of a division help explain the variation in the proportion of residents that register as another party other than Republican, Democrat, or “Independent”?

> summary(lm(second$`Other Party`~second$`Total Pop.`))

Call:

lm(formula = second$`Other Party` ~ second$`Total Pop.`)




Residuals:

    Min      1Q  Median      3Q     Max

-30.163 -10.212  -0.958   5.314  39.011




Coefficients:

                     Estimate Std. Error t value Pr(>|t|)   

(Intercept)         -26.74068   12.18433  -2.195   0.0377 * 

second$`Total Pop.`   0.17109    0.01567  10.918 5.29e-11 ***

---

Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1



Residual standard error: 15.87 on 25 degrees of freedom

Multiple R-squared:  0.8266,  Adjusted R-squared:  0.8197

F-statistic: 119.2 on 1 and 25 DF,  p-value: 5.295e-11

The total population does a good job at explaining the variability in the number of individuals that register as “other party”.  The coefficient of determination is larger at 80.97% and the predictor variable is significant at any level of significance.  This is our best result!

> op = par(mfrow = c(2,2))> plot(lm(second$`Other Party`~second$`Total Pop.`))

02-26-17-analysis-for-fourth-example

The residuals seem to bounce randomly about the residual = 0 line, but there are three outliers flagged by R.  These are the divisions 15, 20, and 25.  On our normal probability plot we also see that the errors are normally distributed for middle values, but not for lower and higher values.  We may need to transform the variable and/or consider another regression type besides linear.


5.)    Is there an association between those who register as another party and the amount individuals in a population that identify as white?

summary(lm(second$`Other Party`~second$White))

Call:
lm(formula = second$`Other Party` ~ second$White)
 
Residuals:
    Min      1Q  Median      3Q     Max 
-45.492 -13.703  -3.486  14.109  60.357 
 
Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept)  28.81628   13.79156   2.089    0.047 *  
second$White  0.34592    0.06099   5.672 6.63e-06 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
 
Residual standard error: 25.21 on 25 degrees of freedom
Multiple R-squared:  0.5627,  Adjusted R-squared:  0.5452 
F-statistic: 32.17 on 1 and 25 DF,  p-value: 6.631e-06

The number of those who identify as white does a good job at explaining the variability in the number of individuals that register as “other party”.  The coefficient of determination is 56.27% and we reject the null hypothesis that there is no association at any level of significance.

This is good news.  Below are the results of the residual v. fits and normal probability plot.

02-26-17-analysis-for-fifth-example

Once again there are three outliers.  Division 25 appears again as an outlier, but now we should further examine the data for divisions 6 and 26.  R depicts a pattern in the residuals, or that they do no bounce randomly around the residual = 0 line.  However, we should also consider the possibility that more data would eliminate this slight pattern or appearance of “non-randomness”.

The normal probability plot is once again good for middle values, but loses its utility at lower and higher values for the divisions 6, 25, and 26.  We could try a transformation of this variable, possibly a squared version of the White variable.

Squared and cubed versions of the WhiteProp, White, and TotalPop do not enhance the models once we look at the residuals versus fits and normal probability plots.

We can attempt to regression two predictor variables on the response variable OtherParty.  Since the total population and white variables were both found to be significant individually, we can see if they together can help to explain the variability in the number of other party registrants.

> summary(lm(second$`Other Party`~second$`Total Pop.` + second$White))
Call:
lm(formula = second$`Other Party` ~ second$`Total Pop.` + second$White)
 
Residuals:
    Min      1Q  Median      3Q     Max 
-30.108 -10.240  -0.993   5.313  39.035 
 
Coefficients:
                      Estimate Std. Error t value Pr(>|t|)    
(Intercept)         -2.671e+01  1.276e+01  -2.092   0.0472 *  
second$`Total Pop.`  1.708e-01  2.826e-02   6.045 3.05e-06 ***
second$White         8.462e-04  6.925e-02   0.012   0.9904    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
 
Residual standard error: 16.2 on 24 degrees of freedom
Multiple R-squared:  0.8266,  Adjusted R-squared:  0.8122 
F-statistic: 57.22 on 2 and 24 DF,  p-value: 7.374e-10

When both predictor variables are included in the same model, only the total population is both to be statistically significant.

 

Cotton subsidies and total production per county


Cotton subsidies have been decreasing since 2005.  This may be associated with the Dispute Settlement Board of the World Trade Association’s recommendation the United States cease subsidizing upland cotton subsidies.

The subsidies in question were:

i) the export credit guarantees under the GSM 102, GSM 103 and SCGP export credit guarantee programmes in respect of exports of upland cotton and other unscheduled agricultural products supported under the programmes, and in respect of one scheduled product (rice);

(ii) Section 1207(a) of the Farm Security and Rural Investment (FSRI) Act of 2002 providing for user marketing (STEP2) payments to exporters of upland cotton; and

(iii) Section 1207(a) of the FSRI Act of 2002 providing for user marketing (STEP2) payments to domestic users of upland cotton. As for the actionable subsidies the recommendation is that the United States takes appropriate steps to remove the adverse effects of certain subsidies or withdraw these subsidies within six months from the date of adoption of the Panel and Appellate Body reports, i.e. the compliance period expired on 21 September 2005.

 

The aggregate amount of subsidies between 2000 and 2014 is depicted below:

02-12-17-cotton-subsidies-since-2000

The following county maps were produced by the United States Department of Agriculture.  We can see the decrease in both the total counties and the amount of cotton (pima and upland varieties) produced by each county between 2010 and 2015.  During these five years, the direct payments, production flexibility contracts, and counter-cyclical programs are completely phased out.  “Other cotton programs” emerge in 2013 and to a great extent in 2014 (EWG Farm Subsidy Database).

2010-pima-cotton-production

Pima cotton moves out of western Texas and emerges in Arizona.  Kern County in California almost completely stops producing pima cotton.

2015-pima-cotton-production-per-county

 


 

2010-upland-cotton-production

There are major decreases in the amount of upload cotton produced by counties between 2010 and 2015.  In particular, we should note the color shade changes in Arkansas, Louisiana, North Carolina, and South Carolina.

2015-upland-cotton-2015

NAFTA – agricultural trends and the future of the trade agreement


Almost 7 years ago, I drafted this capstone paper (nafta_peter_r_abraldes) at the University of Pittsburgh about agricultural subsidies and the development of NAFTA.

A lot has changed since 2010.  Ironically, in 2008 it was former President Obama’s election that caused “uncertainty” around the continued development of NAFTA.  Protectionism was in the air, but NAFTA served as a stabilizer against these protectionist forces and political cycles.  Canada was able to pressure the U.S. to scrap some the “Buy American” provisions in the Stimulus Bill after the Great Recession.

If NAFTA is renegotiated, it seems unlikely that this will include provisions for the environment and labor.  The Democrats sought to add these provisions incrementally.  They were unable to include these in the agreement even though the bill was promoted by former President Clinton.

Mexico could seek to improve NAFTA in two ways.  Mexico could require that the NAFTA market price discriminate between white and yellow corn.  In the world market, white corn is priced higher, but in NAFTA both types are priced the same.  Ironically it is Mexico that “specializes” in white corn while the U.S. produces mostly yellow corn.

Mexico could also renegotiate the Rio Grande Treaty of 1944.  The northern stretch of Mexico has most of the corporate farms.  The treaty is outdated and Mexico’s water requirements are much higher than they were in 1944.

I would like to investigate a few of the topics discussed in the original draft:

  1. What has happened to the tomato industry in all three countries since 2010?  Have changes been made regarding Canada’s “bulk water transfer” prohibitions?  I believe the footnote on page 14 to now be false.  Canadian exports of tomatoes in the winter seem to have increased and some U.S. states have entered the domestic market such as Michigan and New York.  I would like to further investigate this trend.  It could possibly be limited to what I see in the Philadelphia Metropolitan Region.
  2. What are the recent rulings of the Canadian International Trade Tribunal (CITT).  What other controversies besides lumber are being discussed?
  3. Do international and regional commissions help dismantle protectionism and help policy makers overcome the influence of big business?  We should pay attention to their rulings in the face of the current U.S. administration.
  4. What were the major changes in the Agricultural Act (2014) from the Food, Conversation, and Energy Act (2008)?
  5. What were the implications for the U.S. cotton industry since 2001?  How has the WTO ruling reduced cotton production in the United States?  Which regions of the country were most effected?  Does the cotton industry still receive 13% of U.S. agricultural subsidies as it did in 2001?

 

U.S. States Agricultural Exports between 2000 and 2015


The USDA provides data on the value of agricultural goods exported by each U.S. state from 2000 to 2015.  The export value is measured in millions of dollars.

I observed the aggregate amount of agricultural goods exported by each state.  A more detailed analysis by type of commodity might yield more insight into why certain states changed rankings over the course of 15 years.

These other categories are:

Animal Products: Beef, Pork, Hides, Other Livestock, Dairy, Broilers, Other Poultry

Plant Products: Veggies (Fresh), Veggies (Processed), Fruits (Fresh), Fruits (Processed), Tree Nuts, Rice, Wheat, Corn, Feeds, Grain Products, Soybeans, Soymeal, Vegetable Oils, Other Oilseed Products, Cotton, Tobacco, Other Plant Products

(I would also be interested in measuring a state’s geographic proximity to a NAFTA neighbor, using an indicator variable for if this state trades with a NAFTA neighbor, or considering the size of arable land in each state.)

agricultural.exports$BASE represents the vector values for 2010

agricultural.exports$YEAR15 represents the vector values for 2015

 > weighted <- (agricultural.exports$`YEAR15`/1.38)

 

I wanted to analyze the 2015 values after they were adjusted for inflation.  According to the U.S. Inflation Calculator website, $1.38 in 2015 equates to $1 in 2000.  In other words, there was 38% inflation over these 15 years.

> boxplot(agricultural.exports$BASE, notch = TRUE, col = "blue", main = "M($) exports - 2000")
 > boxplot(agricultural.exports$weighted, notch = TRUE, col = "red", main = "M($) exports - 2015 inflation adj.")

02-04-17-exports-by-state-boxplots

As we see, there are two exporting states that represent the outliers beyond the upper whisker of each boxplot.  These states are California and Iowa.  Illinois remains in third place over the 15 years, but is not considered an outlier, even though its exports an aggregate value a little less than that of Iowa.

Over the 15 years, the first quartile showed less variance.  The same occurs in the upper whisker.  This could maybe be described as a harmonization among some states.

It may be interested to group these states by regions or indicate whether they share a border with a NAFTA country.  We would also group them as being inside or outside a certain distance from a NAFTA neighbor.

Not considering the agriculture export values as inflation adjusted can lead to form overly optimistic projects.  We can see “exponential” growth in the exporting values of the top producing states.  See the steam charts below for BASE (2000), YEAR15 (2015), and weighted (2015 adjusted for inflation).

stem(agricultural.exports$BASE)

  The decimal point is 3 digit(s) to the right of the |
  0 | 00001111111222334555566677888999
1 | 233333447999
2 | 237
3 | 14
4 |
5 |
6 | 9
> stem(agricultural.exports$YEAR15)

  The decimal point is 3 digit(s) to the right of the |

   0 | 00111222333445688122455556999
2 | 018891556678
4 | 116
6 | 134
8 | 0
10 | 0
12 |
14 |
16 |
18 |
20 |
22 | 5
> stem(agricultural.exports$weighted)
  The decimal point is 3 digit(s) to the right of the |
   0 | 0011111222233455689901111244455
2 | 0013666677903
4 | 4668
6 | 2
8 |
10 |
12 |
14 |
16 | 3

The summary of agricultural exports gives us a nice breakdown of how the value of agricultural exports changed over 15 years.  Alaska is the state that exported the least, measured in millions of dollars of agricultural product, whatever that product might be.  Adjusting for inflation, Alaska doubled the value of its agricultural exports.  California almost tripled the value of its agricultural exports.  The value of agricultural exports has at least doubled in real terms for all 50 states.

> summary(agricultural.exports)
    STATE                BASE              YEAR15            weighted       
 Length:50          Min.   :   6.204   Min.   :   16.78   Min.   :   12.16  
 Class :character   1st Qu.: 182.337   1st Qu.:  441.42   1st Qu.:  319.87  
 Mode  :character   Median : 698.612   Median : 1570.87   Median : 1138.31  
                    Mean   :1025.309   Mean   : 2661.05   Mean   : 1928.30  
                    3rd Qu.:1327.824   3rd Qu.: 3589.07   3rd Qu.: 2600.77  
                    Max.   :6853.875   Max.   :22546.99   Max.   :16338.40

 

Now we can compare how some states “gain ground” over others.

BASE** YEAR15*** PLACE MOVEMENT*
CA CA 0
IA IA 0
IL IL 0
TX NE -2
NE MN 1
MN TX 1
KS IN -1
FL KS -7
NC ND -5
IN WA 3
OH SD -1
GA OH -5
MO MO 0
ND NC 5
WA FL 5
AR AR 0
KY GA -3
SD MI 7
WI WI 0
MI KY 2
CO PA -4
OK OR -5
OR ID 1
MS MS 0
ID CO 2
PA LA 5
LA OK 1
AL NY -3
VA TN -4
TN AZ 1
NY AL 3
MT MT 0
AZ VA 3
SC SC 0
MD NM -1
NM MD 1
NJ NJ 0
UT UT 0
HI HI 0
ME WY -1
WY ME 1
CT DE 1
DE CT -1
MA VT -1
NV MA -2
WV WV 0
VT NV 3
NH NH 0
RI RI 0
AK AK 0
*The place movement is indicated for each state in relation to the place in occupied in the first column (year 2000).
**2000
***2015

Continue reading

False Allegations of Voter Fraud


Allegations of voter fraud make great headlines, but are generally false.  These allegations are based on “feelings” instead of data.  The claims are ultimately stoked by the interest of incumbent politicians and political parties seeking to suppress the vote when greater voter turnout does not favor their odds.

The maximum occurrence of voter fraud in the United States between 2000 and 2014 has been calculated at 0.0000031%. 

We typically see the mobilization efforts of political parties.  What we often do not see, are their demobilization efforts.  According to Groarke, both methods are equally important during an election cycle (Groarke 2016).  We see an example of a demobilization effort when voter requirements become more stringent in response to allegations of “voter fraud”.

Groarke studied three campaigns to improve voter turnout and the allegations waged against these efforts in the name of “voter fraud”.  Groarke found that the representative’s tenure and propensity to want more “unreliable voters” in their district influenced their political calculus.

There was an observed difference between northern and southern Democrats in response to each effort.  Some Republicans even initially supported these efforts, but eventually retreated from these positions.

Postcard Registration Bills (1971 – 1976) introduced by Senator Gale McGee (D-WY)

The arguments against the legislation:

  • Registration of unqualified persons
  • Registration of nonexistent persons
  • Postcard registration cost
  • Danger of federal intrusion into election process
  • Nonvoters are naturally uninterested in politics

Minnesota and Maryland served as a case example of how postcard registration improved voter turnout.  Between the two, a single case of fraud was determined during the registration leading up to the 1974 election.  In exchange for that one case of fraud, registration increased by 1.5% from the previous voting period and was a cheap way to increase participation (Ford Foundation 715).

Election Day Registration (~1976) introduced by President Jimmy Carter and VP Mondale

The arguments against the legislation:

  • Endangers the integrity of franchise
  • Serious threats of fraud even if voters showed identification while registering
  • Increased federal regulation and would discourage participation

Minnesota had allowed same day voter registration since 1973.  Between 1972 and 1976, the percentage of the population actively voting increased from 68.4% to 71.4%.  22.9% of these voters took advantage of their ability to register on election day.  In 1976, Minnesota has the highest voter turnout in the nation (Smolka 26).

Groarke once again noticed that the largest opponents of same day election registration were Republicans or Southern Democrats who were incumbent members of Congress within “safe” districts (578-580).

The beginnings of the Motor Voter Law (mid to late 1970s, 1980s)

The arguments against the legislation:

  • Fraud
  • People who really want to vote will find a way (comparing voters in El Salvador to those in the U.S.)

If effective, the Motor Voter Law would have required states to offer Election Day registration, mail registration, and ensure that government agencies offered voter registration and the ability to update existing registration.  The initial movement also did not include provisions for “non-voter purging”.

Although Election Day Registration failed, some states opted to use mail registration and have government agencies assist in registering and updating existing voter information.  The Reagan administration fought Motor Voter Laws with the Hatch Act.  Litigation tied the ability of the government agencies to consistently offer voter registration and updates to existing registration.  This law became known as the “Motor Voter Law” because essentially only motor vehicle agencies were seamlessly incorporating voter registration as part of their processes (582).

In the 1980s, certain portions of the population tended to live in cities while others lived in the suburbs.  This translates into the need for a driver’s license.

Today, the reality is much different than it was in the 1980s.  Unfortunately, this data from the U.S. Census Bureau’s American FactFinder is not readily available for the 1980s.  It would require more time to juxtapose a snapshot of 1980 and 2015.  I could also then see if the changes are statistically significant.

02-02-17-workers-and-drivers

Proponents of the law had to negotiate to win over opponents.  Penalties were included for the alleged fraudulent registration.  Same day registrants were also segregated and further scrutinized before their votes were counted (Groarke 585 – 586).

The mail delivery service is not consistent in all communities.  This became a problem when a mailing purge was suggested to occur every two years.  Voters that did not respond to the mailer would be removed from the voting list.

National Voter Registration Act – (1993) President Clinton and Rep. Al Swift (D-WA)

The Motor Voter Law reemerges and undergoes significant changes.  Election Day registration is dropped and voter list maintenance requirements were added.  It was also no longer considered mandatory for unemployment agencies to offer voter registration.

Some states threw up roadblocks, by requiring two separate registration processes for voting in state versus national elections.

Groarke reminds us that political parties exert equal efforts to mobilize and demobilize potential voters.  Her Table 3 shows how many voter removals there are annually in percentage terms of voting applications.  The purging process has been implemented nationally thanks to the National Voter Registration Act.  During the first year, a name is identified “to be purged”.  If the potential voter has not communicated with voter registration by year two, they are purged from the voting rolls.

02-02-17-the-impact-of-voter-fraud-claims

With the emphasis on “fraud”, we must remind ourselves that those who might want to commit fraud probably do a simple cost versus benefit analysis.    What is the incremental benefit of one vote against the fines and prison sentences if the one fraudulent vote is discovered?  Levitt argues that the incremental vote is not worth the fines and penalties for a rational agent (Levitt 2007).

Instead, we should consider other rationale for mistakes made at the polls.  Some of these possibilities are:

  • Clerical or typographical errors (ie: signing the wrong line or choosing an identical or almost identical name)
  • First and last names or parts of street number addresses are inverted
  • Incomplete written data matches with another person (ie: consider middle initials)
  • Common names are prone to being flagged and purged
  • Certain birthdates are more common than others (ie: Check out this article about the probability of you being born on a given day.)
  • Voters move and can be registered at two addresses, but only vote once.
  • Voters can begin filling out a form on election day, make a mistake, be given another form, and an election official can accidently count the discarded form again.
  • Voters can vote before an election and die by the time the vote is confirmed.
  • The right to vote for felons is not consistent across all states. In some states once you are released, your right to vote is restored.  In others, you must go through a process to regain your right to vote.  In addition, misdemeanor offenders retain the right to vote.
  • “Caging” efforts to purge voters are not always effective. This is a tactic used to see which postcards are returned by the USPS.  Sometimes the potential voter is out of the country or their area is poorly serviced by the USPS.
  • Just because an address is unusual does not mean it is illegitimate. Homeless persons can register their address as the local shelter and business owners can live in the same building as their business.

Levitt calculates overall documented fraud rates in the following states:

  • Missouri: 0.0003%
  • New Hampshire: 0.0000%
  • New Jersey: 0.0002%
  • New York: 0.000009%
  • Wisconsin: 0.0000%

In a Washington Post article Levitt reports 31 incidents of voter fraud in all general, primary, special, and municipal elections from 2000 through 2014 (Levitt 2014).  These 31 incidents occurred out of over 1 billion ballots cast during these 12 years.  Some of fraud allegations have not been fully investigated, which may indicate that they are falsely being flagged as “fraud”.

Model building: What accounts for the variation in Brazilian state GDP?


This is the first attempt to build a model that accounts for some of the observed variability in the Brazilian state GDP values.  The investigation is attached as a PDF because some of the Minitab output does not retain its original format when it is pasted into WordPress.

Here are the potential predictor variables, their type, measurement, and the source used to obtain the data:

State contribution to national GDP Quantitative Year 2010, measured in R$ 1,000,000 IBGE
Population Quantitative Year 2010, measured by individuals IBGE
Municipalities Quantitative Year 2010, measured by individual municipalities Fundação Joaquim Nabuco
Region Qualitative Year 2010, the federal government identifies five regions (more details below) Fundação Joaquim Nabuco

 

Foreign Border Qualitative Year 2010, border is defined as “yes” or “no” Consult any modern day geopolitical map of Brazil.
Border with MERCOSUR country Qualitative Year 2010, border is defined as “yes” or “no”; MERCOSUR is defined as sharing a border with Argentina, Uruguay, Paraguay, or Venezuela.  Venezuela was admitted to MERCOSUR in 2012, but its inclusion was considered important for the economic impact it could have had on border states Consult any modern day geopolitical map of Brazil.
Foreign Presence Quantitative This is defined as the number of consulates in a given state.  This would probably be more meaningful if we exclude the Federal District. Wikipedia
Ocean Port Quantitative The number of ports in the state (measured by various sources between the years 2009 and 2015) Wikipedia

 

 

River Port Quantitative The number of ports in the state (measured by various sources between the years 2009 and 2015).  I also considered making the predictor variables for ocean and river port qualitative. Wikipedia

 

Soy Quantitative The amount of thousand tons exported during the two harvests of year of the growing year 2010/2011. CONAB
Corn Quantitative The amount of thousand tons exported during the two harvests of year of the growing year 2010/2011. CONAB

Corn and soy account for about 80% of the grains produced in Brazil.  Soy has “immediate liquidity” since it is an international commodity while corn historically was used for internal supply (EMBRAPA).

When I look for significance of “region”, I should be aware that the sample size of each region is very small.  There are 3 to 9 observations per group, with 9 for the Northeast being more the exception than the rule.  I suspect this variable might have been significant if we coded municipalities per region and used their relative GDP instead of that for the state.  If we opted to include this variable in the model, we would have included four dummy variables and chosen one baseline region.

After looking at associations, I could be interesting to see if the deleted residual for the Federal District makes the point influential.  This unit is an anomaly in a few categories, especially in regards to the “foreign presence”.

Finally, I understand that with more time I would have consulted original sources instead of using data from second-hand sources.

I begin by a simple matrix plot of each variable regressed against the other.

12-23-16-brazil-post-image-1

Visually we can identify a few associations between variables:

  • Population and number of municipalities
  • Population and state GDP
  • Soy and corn exports

Correlation: Population, Municipalities, GDP, Soy, Corn

 

                    Population     Municipalities         GDP             Soy

Municipalities           0.762

GDP                      0.953           0.629

Soy                      0.072           0.292           0.064

Corn                     0.353           0.607           0.313           0.860

 

Cell Contents: Pearson correlation

 

To a lesser degree, associations exist between:

·         Foreign Presence and GDP

·         Soy and GDP

·         Corn and GDP

Correlation: Foreign Presence, GDP, Soy, Corn

 

                 Foreign Presence               GDP               Soy

GDP                          0.357

Soy                         -0.067             0.064

Corn                        -0.032             0.313             0.860

 

Cell Contents: Pearson correlation

 The rest of this investigation can be found via this link: 12-00-16-brazilian-gdp-project.

 

 

U.S. state fertilizer indices and growth of factor productivity levels


I use USDA data from 1960 and 2004 to create a brief exploratory analysis about what makes some states more agriculturally productive than others.

Is a higher fertilizer index associated with higher factor productivity levels?

H0: There is no correlation between fertilizer indices and the growth of factor productivity.

Ha: There is a correlation between fertilizer indices and the growth of factor productivity.

 

Both growth of factor productivity levels and fertilizer consumption indices are relative to Alabama in 1998.  Alaska and Hawaii are the only states excluded.

I run the regression analysis for the 1960 and 2004 data.  A small p-value for the 1960 data has us reject the null hypothesis and conclude the alternative hypothesis that a correlation between the two variables exists.  A larger p-value for the 2004 data has us fail to reject the null hypothesis. 

We should note that the fertilizer index variable explains such a small percentage of the variability in the response variable.  The data points are scattered far from the regression line.  We see this by the value of the R-sq value.  Over time, the fertilizer index predictor variable explains even less of the variability in the response variable. 

Regression Analysis: Factor Productivity (1960) versus Fertilizer Indices in 1960

 Analysis of Variance

Source                        DF   Adj SS    Adj MS  F-Value  P-Value

Regression                     1  0.07532  0.075324     7.75    0.008

  Fertilizer Indices in 1960   1  0.07532  0.075324     7.75    0.008

Error                         46  0.44736  0.009725

Total                         47  0.52269

 

Model Summary 

        S    R-sq  R-sq(adj)  R-sq(pred)

0.0986168  14.41%     12.55%       1.83%

  Coefficients

 

Term                          Coef  SE Coef  T-Value  P-Value   VIF

Constant                    0.4969   0.0222    22.40    0.000

Fertilizer Indices in 1960  0.0499   0.0179     2.78    0.008  1.00

  

Regression Equation

 Factor Productivity (1960) = 0.4969 + 0.0499(Fertilizer Indices in 1960)

 

 Fits and Diagnostics for Unusual Observations

            Factor

     Productivity

Obs        (1960)     Fit    Resid  Std Resid

  2        0.7057  0.5104   0.1953       2.02  R         [Arizona]

  4        0.8643  0.6561   0.2082       2.34  R  X      [California]

  8        0.8649  0.5997   0.2652       2.78  R          [Florida]

 33        0.4673  0.6438  -0.1765      -1.94     X        [Ohio]

 

R  Large residual

X  Unusual X

 

 

Regression Analysis: Factor Productivity (2004) versus Fertilizer Indices in 2004

 Analysis of Variance

Source                        DF  Adj SS   Adj MS  F-Value  P-Value

Regression                     1  0.1356  0.13556     2.12    0.152

  Fertilizer Indices in 2004   1  0.1356  0.13556     2.12    0.152

Error                         46  2.9435  0.06399

Total                         47  3.0791

 

Model Summary

       S   R-sq  R-sq(adj)  R-sq(pred)

0.252961  4.40%      2.32%       0.00%

 

Coefficients

Term                          Coef  SE Coef  T-Value  P-Value   VIF

Constant                    1.1049   0.0493    22.39    0.000

Fertilizer Indices in 2004  0.0184   0.0127     1.46    0.152  1.00

 Regression Equation

 Factor Productivity (2004) = 1.1049 + 0.0184(Fertilizer Indices in 2004)

 Fits and Diagnostics for Unusual Observations

     Factor

     Productivity

Obs        (2004)     Fit    Resid  Std Resid

  1        1.7979  1.1305   0.6674       2.67  R     [Alabama]

  2        1.6304  1.1162   0.5142       2.07  R     [Arizona]

  4        1.5297  1.2817   0.2480       1.06     X  [California]

 13        1.3554  1.3211   0.0343       0.15     X  [Iowa]

 47        0.5777  1.1679  -0.5902      -2.36  R     [Wisconsin]

 48        0.5712  1.1103  -0.5391      -2.17  R     [Wyoming]

 

R  Large residual

X  Unusual X

 09-06-16-fitted-line-plot-1

09.06.16 fitted line plot #2.png

 

We could create a prediction interval for the 1960 data, but the low R-sq value indicates that this interval will be wider than desired.

We may want to include other variables in the linear regression to see if we can better capture the changes of variability of the response variable.

 Data for this brief exploratory analysis was gathered from the USDA website, specifically this page: http://www.ers.usda.gov/data-products/agricultural-productivity-in-the-us.aspx#28268.

 

 

 

 

 

Revised project with ANOVA and Turkey HSD


This was the original proposal.  Even though the revised version is also elementary, I think it is more cohesive.  I can explore the bootstrap procedure another time in the future.

07.21.16 R Final Project – edited version