Category Archives: Uncategorized

Jumping Frogs


Last weekend I spent a considerable amount of time making origami frogs and making them jump.  The assignment was to create an experiment.  I choose to create a factorial design, including three main effects and a blocking factor.

This is the voice recording of my attempt to explain the below.

For each of the three variables, a “low” and “high” level was chosen as outlined by the table below.

Factor Description Levels
Size large (4.75″ x 9.75″)

small (2.25″ x 4.5″)

large (+1)

small (-1)

Paper Type Origami (thinner)

Computer Paper (thicker)

Origami (+1)

Computer Paper (-1)

Paper Clip Half of the frogs in each replicate had a paper clip protruding outward from the triangle face, directly on top of the middle seam.  Half of each clip was on the frog and the other half was in the air.  The same size paper clip was used on both small and large frogs. clip used (+1)

clip not used (-1)

All replicates were tested on the same oak table; however, the six replicates were divided into two groups. A fine layer of sand was poured on the table for the second block. I used a blocking factor because it would have been impractical to remove the sand between runs. The theory was that sand could help prevent some frogs from tumbling, therefore increasing the odds that frogs destined to land upright remain upright.

My aim was to get a power greater than 90% with a full factorial design. I considered using either three or four factor models. In order to have a three-factor model, six replicates were needed for 48 runs giving us a power of 0.92217.

I also considered using four factors with three replicates for 48 runs giving us a power of 0.920780.

For the four-factor model, I considered fractional factorial designs that would have resulted in aliasing of the main effects with three-way interactions and the two-way interactions with one another. I was unsure if two-way interactions would be significant, but was comfortable with the assumption that the three-way interaction would not be significant. I considered other variables as the potential fourth main effect and its ease to replicate. Some of the potential factors were also too objective, such as the quality of jumping crease, the degree certain frogs “slid” or “bounced”, and the order made (many pairs were made simultaneously in steps).

Eventually I decided against including a fourth main variable, opting to use a full 23 full factorial design. However, I decided to include a blocking factor for half of the design, altering the landing surface of the table. This resulted in a power level similar to the other two options, approximately 0.92. The resulting power is below, slightly better than the four-factor option with three replicates and slightly worse than the three-factor option with six replicates without a blocking factor.

The effect size and sigma were entered as 7 and 7, since ultimately it is there ratio that is important (Lesson 12 conversation).

Full Factorial Design

Design Summary

Factors: 3 Base Design: 3, 8
Runs: 48 Replicates: 6
Blocks: 2 Center pts (total): 0

Block Generators:  replicates

All terms are free from aliasing.

 

 

Power and Sample Size

2-Level Factorial Design

α = 0.05  Assumed standard deviation = 7

Method

Factors: 3 Base Design: 3, 8
Blocks: 2

Including blocks in model.

Results

Center Points Per Block Effect Reps Total Runs Power
0 7 6 48 0.921853

12.09.17 power curse

 

As stated before, I waivered for a while between including either three or four factors in this model. I opted for simplicity, but could not resist attempting to include a blocking factor. The blocking factor helped reduce the size of the mean squared error and in general improves the power of the test, although in this case the power was already good.

Originally, I thought I might have been able to run the experiment with two or three replicates. I knew at least two were required to get an estimate of variation.

I eventually figured out that it would require six replicates if I wanted to use a 2^3 full factorial design. I considered the inclusion of center points, but did not think they would be more appropriate for 3^k factorial, central composite, and Box-Behnken designs where additional levels make measuring for curvature more important.

Logistically, I found that the quality of my frogs worsened and then improved throughout the manufacturing process. I did not know if I should have included some sort of quality variable to judge the creases or a covariate for measuring the creation order of the frogs.

The design was created in the Minitab file. In a separate Excel worksheet, I recorded the actual ten distances and landing positions for each of the 48 runs. Distances were measured in inches while the runs were coded as either a 0 (not landing on feet) or 1 (landing on feet). The averages and standard deviations were calculated for the ten repetitions for each run. These run averages and standard deviations are included in the rows in the Minitab project file.

First I ran a factorial regression for the average distance of each run regressed against the blocks (sand) and the following factors: A (size), B (paper type), and C (clip).

First, I built one model for the mean response and another to study the variability of the repetitions in each of the runs.

Below is the analysis of variance modeling the mean response of distance:

Factorial Regression: Y1 versus Blocks, A, B, C

Analysis of Variance

Source DF Adj SS Adj MS F-Value P-Value
Model 8 712.66 89.083 3.29 0.006
  Blocks 1 80.60 80.601 2.97 0.093
  Linear 3 557.68 185.894 6.86 0.001
    A 1 462.52 462.521 17.06 0.000
    B 1 94.08 94.080 3.47 0.070
    C 1 1.08 1.080 0.04 0.843
  2-Way Interactions 3 70.86 23.621 0.87 0.464
    A*B 1 9.54 9.541 0.35 0.556
    A*C 1 61.20 61.201 2.26 0.141
    B*C 1 0.12 0.120 0.00 0.947
  3-Way Interactions 1 3.52 3.521 0.13 0.721
    A*B*C 1 3.52 3.521 0.13 0.721
Error 39 1057.48 27.115
  Lack-of-Fit 7 74.68 10.668 0.35 0.925
    Pure Error 32 982.81 30.713
Total 47 1770.15

Model Summary

S R-sq R-sq(adj) R-sq(pred)
5.20720 40.26% 28.01% 9.51%

Coded Coefficients

Term Effect Coef SE Coef T-Value P-Value VIF
Constant 13.133 0.752 17.47 0.000
Blocks
1 1.296 0.752 1.72 0.093 1.00
A -6.208 -3.104 0.752 -4.13 0.000 1.00
B -2.800 -1.400 0.752 -1.86 0.070 1.00
C 0.300 0.150 0.752 0.20 0.843 1.00
A*B 0.892 0.446 0.752 0.59 0.556 1.00
A*C 2.258 1.129 0.752 1.50 0.141 1.00
B*C 0.100 0.050 0.752 0.07 0.947 1.00
A*B*C -0.542 -0.271 0.752 -0.36 0.721 1.00

Regression Equation in Uncoded Units

Y1 = 13.133 – 3.104 A – 1.400 B + 0.150 C + 0.446 A*B + 1.129 A*C + 0.050 B*C – 0.271 A*B*C

Equation averaged over blocks.

Alias Structure

Factor Name
A A
B B
C C

 

Aliases
I
Block 1
A
B
C
AB
AC
BC
ABC

Fits and Diagnostics for Unusual Observations

Obs Y1 Fit Resid Std Resid
30 5.20 20.68 -15.48 -3.30 R

R  Large residual

12.09.17 Pareto chart

The ANOVA table and Pareto chart were used to simplify this model. I began by removing the three-way interaction term and reallocating those degrees of freedom to the error term. After refitting the model, the two-way interaction terms were still not significant. The two-way interaction terms were removed, and the model was refit with only the main effects. Using the contour plots for distance, graphically we see slight curvature, between factors A (size) and B (paper type) and the interaction of factors A (size) and C (paper clip), but not enough to indicate significance.

12.09.17 contour plots.png

The main effect for factor C (clip) was still not significant at any reasonable level for alpha. It was removed from the model.

Factor A (size) is the only factor significant when we set alpha at 0.05. The blocking factor (sand) and factor B (paper type) would be significant if we used our power goal of 0.92. They are kept in the model for this reason.

Below we fail to reject the null hypothesis that the model has lack of fit. Most of the error is attributed to “pure error”. There are many error degrees of freedom thanks to the six repetitions of each run.

The predictive power of this model is poor to average, with a R-squared adjusted 31.63%. My poor folds or clumsiness may have played a role, although the consistency of having one operator should have minimized the effects of these mistakes made on any of the factor levels.

Factorial Regression: Y1 versus Blocks, A, B

Analysis of Variance

Source DF Adj SS Adj MS F-Value P-Value
Model 3 637.20 212.40 8.25 0.000
  Blocks 1 80.60 80.60 3.13 0.084
  Linear 2 556.60 278.30 10.81 0.000
    A 1 462.52 462.52 17.96 0.000
    B 1 94.08 94.08 3.65 0.062
Error 44 1132.94 25.75
  Lack-of-Fit 12 150.14 12.51 0.41 0.950
    Pure Error 32 982.81 30.71
Total 47 1770.15

Model Summary

S R-sq R-sq(adj) R-sq(pred)
5.07432 36.00% 31.63% 23.83%

Coded Coefficients

Term Effect Coef SE Coef T-Value P-Value VIF
Constant 13.133 0.732 17.93 0.000
Blocks
  1 1.296 0.732 1.77 0.084 1.00
A -6.208 -3.104 0.732 -4.24 0.000 1.00
B -2.800 -1.400 0.732 -1.91 0.062 1.00

Regression Equation in Uncoded Units

Y1 = 13.133 – 3.104 A – 1.400 B

Equation averaged over blocks.

Alias Structure

Factor Name
A A
B B
C C

 

Aliases
I
Block 1
A
B

Fits and Diagnostics for Unusual Observations

Obs Y1 Fit Resid Std Resid
30 5.20 18.93 -13.73 -2.83 R

R  Large residual

12.09.17 Pareto chart # 2.png

The variability of the 10 repetitions for each of the 48 runs is now analyzed with the same response variable as distance.

Analysis of Variability: s1 versus Blocks, A, B, C

Method

Estimation Least squares

Analysis of Variance for Ln(s1)

Source DF Adj SS Adj MS F-Value P-Value
Model 8 902.77 112.846 2.18 0.050
  Blocks 1 6.08 6.079 0.12 0.733
  Linear 3 426.84 142.281 2.75 0.055
    A 1 237.93 237.928 4.60 0.038
    B 1 188.78 188.783 3.65 0.063
    C 1 0.13 0.133 0.00 0.960
  2-Way Interactions 3 447.55 149.182 2.89 0.048
    A*B 1 229.49 229.495 4.44 0.042
    A*C 1 50.35 50.347 0.97 0.330
    B*C 1 167.71 167.705 3.24 0.079
  3-Way Interactions 1 22.30 22.303 0.43 0.515
    A*B*C 1 22.30 22.303 0.43 0.515
Error 39 2015.58 51.682
  Lack-of-Fit 7 419.98 59.998 1.20 0.329
    Pure Error 32 1595.60 49.862
Total 47 2918.35

Model Summary for Ln(s1)

S R-sq R-sq(adj) R-sq(pred)
7.18899 30.93% 16.77% 0.00%

Coded Coefficients for Ln(s1)

Term Effect Ratio Effect Coef SE Coef T-Value P-Value VIF
Constant 2.351 0.108 21.74 0.000
Blocks
  1 -0.037 0.108 -0.34 0.733 1.00
A -0.464 0.629 -0.232 0.108 -2.15 0.038 1.00
B -0.413 0.661 -0.207 0.108 -1.91 0.063 1.00
C -0.011 0.989 -0.005 0.108 -0.05 0.960 1.00
A*B -0.456 0.634 -0.228 0.108 -2.11 0.042 1.00
A*C -0.214 0.808 -0.107 0.108 -0.99 0.330 1.00
B*C -0.390 0.677 -0.195 0.108 -1.80 0.079 1.00
A*B*C 0.142 1.153 0.071 0.108 0.66 0.515 1.00

Regression Equation in Uncoded Units

Ln(s1) = 2.351 – 0.232 A – 0.207 B – 0.005 C – 0.228 A*B – 0.107 A*C – 0.195 B*C + 0.071 A*B*C

Equation averaged over blocks.

Alias Structure

Factor Name
A A
B B
C C

 

Aliases
I
Block 1
A
B
C
AB
AC
BC
ABC

Fits and Diagnostics for Unusual Observations

Original Response

Obs s1 Fit Ratio Residual
18 0.989 4.417 0.224

Fits and Diagnostics for Unusual Observations

Transformed Response

Obs Ln(s1) Ln(Fit) Ln(Resid) Std Ln(Resid)
18 -0.011 1.485 -1.497 -2.22 R

R  Large residual

12.09.17 Pareto chart #3.png

 

The variability in the distance data of some of these factors, and their interactions, is significant in the full model. I will refit the model to see if anything changes, first by dropping out the three-way interaction.

Eventually we drop out factor C (clip) and all interactions that involve it. We keep factor B (paper type) because the interaction A*B (size * paper type) is significant when alpha is set at 5%. Factor B (paper type) remains in the model because of the hierarchy principle.

Analysis of Variability: s1 versus Blocks, A, B

Method

Estimation Least squares

Analysis of Variance for Ln(s1)

Source DF Adj SS Adj MS F-Value P-Value
Model 4 662.28 165.571 3.16 0.023
  Blocks 1 6.08 6.079 0.12 0.735
  Linear 2 426.71 213.355 4.07 0.024
    A 1 237.93 237.928 4.53 0.039
    B 1 188.78 188.783 3.60 0.065
  2-Way Interactions 1 229.49 229.495 4.37 0.042
    A*B 1 229.49 229.495 4.37 0.042
Error 43 2256.07 52.467
  Lack-of-Fit 11 660.47 60.043 1.20 0.324
    Pure Error 32 1595.60 49.862
Total 47 2918.35

Model Summary for Ln(s1)

S R-sq R-sq(adj) R-sq(pred)
7.24339 22.69% 15.50% 3.67%

Coded Coefficients for Ln(s1)

Term Effect Ratio Effect Coef SE Coef T-Value P-Value VIF
Constant 2.351 0.109 21.57 0.000
Blocks
  1 -0.037 0.109 -0.34 0.735 1.00
A -0.464 0.629 -0.232 0.109 -2.13 0.039 1.00
B -0.413 0.661 -0.207 0.109 -1.90 0.065 1.00
A*B -0.456 0.634 -0.228 0.109 -2.09 0.042 1.00

Regression Equation in Uncoded Units

Ln(s1) = 2.351 – 0.232 A – 0.207 B – 0.228 A*B

Equation averaged over blocks.

Alias Structure

Factor Name
A A
B B
C C

 

Aliases
I
Block 1
A
B
AB

Fits and Diagnostics for Unusual Observations

Original Response

Obs s1 Fit Ratio Residual
18 0.989 5.593 0.177
30 2.400 12.490 0.192

Fits and Diagnostics for Unusual Observations

Transformed Response

Obs Ln(s1) Ln(Fit) Ln(Resid) Std Ln(Resid)
18 -0.011 1.721 -1.733 -2.42 R
30 0.875 2.525 -1.649 -2.31 R

R  Large residual

12.09.17 Pareto chart #4.png

Using the ANOVA table and the Pareto chart, we see that the variances of factor A (frog size) and the interaction of factors A (frog size) and B (paper type) is significant across the repetitions for each run.

Observing the residual analysis, there are no serious violations of our assumptions that the error terms are normally distributed with a mean of zero and a variance of one. The data was run in random order and the residual versus order plot indicates no issues. Our histogram is okay, but does show us that there are outliers. In this case, the outliers are “duds” or frogs whose seams were so hopelessly created that they stood no shot at jumping very far.

12.09.17 4 charts.png

Binary Logistic Regression: Successes versus A, B, C, Blocks

Method

Link function Logit
Categorical predictor coding (1, 0)
Rows used 48

Response Information

Variable Value Count Event Name
Successes Event 230 Event
Non-event 250
Trials Total 480

Deviance Table

Source DF Adj Dev Adj Mean Chi-Square P-Value
Regression 4 25.388 6.3470 25.39 0.000
  A 1 0.880 0.8795 0.88 0.348
  B 1 0.880 0.8795 0.88 0.348
  C 1 0.880 0.8795 0.88 0.348
  Blocks 1 22.878 22.8784 22.88 0.000
Error 43 95.759 2.2270
Total 47 121.147

Model Summary

Deviance R-Sq Deviance R-Sq(adj) AIC
20.96% 17.65% 649.20

 

Coefficients

Term Coef SE Coef VIF
Constant -0.532 0.134
A -0.0880 0.0939 1.00
B 0.0880 0.0939 1.00
C -0.0880 0.0939 1.00
Blocks
  2 0.887 0.188 1.00

Odds Ratios for Continuous Predictors

Odds Ratio 95% CI
A 0.9158 (0.7619, 1.1008)
B 1.0920 (0.9085, 1.3125)
C 0.9158 (0.7619, 1.1008)

Odds Ratios for Categorical Predictors

Level A Level B Odds Ratio 95% CI
Blocks
  2 1 2.4286 (1.6806, 3.5094)

Odds ratio for level A relative to level B

Regression Equation

P(Event) = exp(Y’)/(1 + exp(Y’))

 

Blocks
1 Y’ = -0.5316 – 0.08798 A + 0.08798 B – 0.08798 C
2 Y’ = 0.3557 – 0.08798 A + 0.08798 B – 0.08798 C

 

Goodness-of-Fit Tests

Test DF Chi-Square P-Value
Deviance 43 95.76 0.000
Pearson 43 84.58 0.000
Hosmer-Lemeshow 6 10.81 0.094

Fits and Diagnostics for Unusual Observations

Obs Observed Probability Fit Resid Std Resid
6 1.0000 0.5665 3.3712 3.57 R
13 0.9000 0.5665 2.3235 2.46 R
15 0.2000 0.5229 -2.1039 -2.23 R
16 0.2000 0.5229 -2.1039 -2.23 R
17 0.3000 0.6091 -1.9770 -2.09 R
18 1.0000 0.5665 3.3712 3.57 R
22 0.3000 0.6091 -1.9770 -2.09 R
33 0.1000 0.3909 -2.0737 -2.19 R
37 0.8000 0.3909 2.6467 2.80 R

R  Large residual

 

The Pearson Chi-square statistic and associated p-value indicates that we reject the null hypothesis that the model fits the data, concluding the alternative hypothesis that the model does not fit. (Lesson 2.4 of STAT504).

H0: the model M0 fits

HA
: the model M0 does not fit (or, some other model MA fits)

When we reduce the model for the success response ratio, our Chi-square statistics and associated p-values do not improve.

I also tried a factorial regression with landing success as the response. None of the p-values associated with our main factors were significant. The p-value associated with the blocks should only be used for when considering whether to include a blocking factor. The sand as a blocking factor did impact the performance of the launches.

Factorial Regression: Successes versus Blocks, A, B, C

Analysis of Variance

Source DF Adj SS Adj MS F-Value P-Value
Model 8 80.917 10.1146 2.13 0.056
  Blocks 1 56.333 56.3333 11.88 0.001
  Linear 3 6.250 2.0833 0.44 0.726
    A 1 2.083 2.0833 0.44 0.511
    B 1 2.083 2.0833 0.44 0.511
    C 1 2.083 2.0833 0.44 0.511
  2-Way Interactions 3 4.250 1.4167 0.30 0.826
    A*B 1 0.083 0.0833 0.02 0.895
    A*C 1 4.083 4.0833 0.86 0.359
    B*C 1 0.083 0.0833 0.02 0.895
  3-Way Interactions 1 14.083 14.0833 2.97 0.093
    A*B*C 1 14.083 14.0833 2.97 0.093
Error 39 185.000 4.7436
  Lack-of-Fit 7 58.333 8.3333 2.11 0.072
    Pure Error 32 126.667 3.9583
Total 47 265.917

Model Summary

S R-sq R-sq(adj) R-sq(pred)
2.17798 30.43% 16.16% 0.00%

Coded Coefficients

Term Effect Coef SE Coef T-Value P-Value VIF
Constant 4.792 0.314 15.24 0.000
Blocks
  1 -1.083 0.314 -3.45 0.001 1.00
A -0.417 -0.208 0.314 -0.66 0.511 1.00
B 0.417 0.208 0.314 0.66 0.511 1.00
C -0.417 -0.208 0.314 -0.66 0.511 1.00
A*B 0.083 0.042 0.314 0.13 0.895 1.00
A*C 0.583 0.292 0.314 0.93 0.359 1.00
B*C 0.083 0.042 0.314 0.13 0.895 1.00
A*B*C 1.083 0.542 0.314 1.72 0.093 1.00

Regression Equation in Uncoded Units

Successes = 4.792 – 0.208 A + 0.208 B – 0.208 C + 0.042 A*B + 0.292 A*C + 0.042 B*C + 0.542 A*B*C

Equation averaged over blocks.

Alias Structure

Factor Name
A A
B B
C C

 

Aliases
I
Block 1
A
B
C
AB
AC
BC
ABC

Fits and Diagnostics for Unusual Observations

Obs Successes Fit Resid Std Resid
37 8.000 3.083 4.917 2.50 R

R  Large residual

12.09.17 Pareto chart # 5.png

12.09.17 4 charts #2

Through the response optimization method, we would choose the following levels of each variable. In this sense optimization is considered the longest possible jumps and the greatest odds of the frog landing on its feet. Both response variables are of equal importance.

Factor A (size) is optimized at the smaller size for both response variables.

Factor B (paper) results are indecisive. To optimize jump distance alone we would use frogs made of computer paper. To optimize the odds of the frogs landing on their feet we would use the origami paper. Since both responses are equally important, I analyze the main effects plots for each response in more detail. I am trying to find if the fitted mean range for one is much larger than the other. However, both responses have relatively short fitted means, which may indicate that the results for this factor are indeed indecisive and may not need to be considered when we are optimizing the response.

 

12.09.17 optimization.png

12.09.17 landing odds.png12.09.17 distance.png

Factor C (paper clip) is optimized for both responses at the low level. Frogs without paper clips jump further and have a higher odd of landing on their feet, according to this optimization model.

However, in the factorial regression model, to optimize jumping distance, we settled on a model that included only the terms for factors A (size) and factor B (paper type). The blocking factor was also included.

Further investigation could be done to discern if factor C (clip) impacts small frogs versus large frogs differently. My theory is that the relative weight of the paper clip on the small frogs is higher than that on the large frogs, therefore possibly impacting the jumping distance and successful lands of the two sized frogs differently.

Since there is no consensus on the optimization of factor B (paper type). Our final model may indeed only include one main effect for factor A (size), concluding that little frogs jump further and have greater odds of landing on their feet, considering the variation blocked by the surface of the batch.

Advertisements

Interest Rates and Effective Federal Funds Rate


As I learn SAS, I would still like to retain a working knowledge of R.  Unlike SAS and Minitab, R is a free software.
As many comparison sites indicate, there is a steeper learning curve for R.  Initially you can become frustrated by making small typos that result in error messages.  I had to overcome initial aversion to finding and installing the right packages.  There are no automatic updates, so you need to discern the cause of mistakes, be it due to format, typos, or outdated programs.  An organized project folder also saves time later.
With time limitations, I will try to “explore” data.  Data mining can be much more time consuming than the actual analysis.  For this reason, the scoop of these explorations will be limited as this blog is kept as a hobby.

Below are three questions I want to explore based on a data set that contains the following variables, recorded monthly, from 1954 to 2016:

 

## How have interest rates evolved over the last few decades? 

 

## How many times was the effective federal funds target rate above the rate of inflation?

 

## Are real GDP changes, unemployment rates, and inflation rates variables that help predict the variation in the inflation rate?

#These are the variables:

> str(interest_rates)

Classes ‘tbl_df’, ‘tbl’ and ‘data.frame’:    904 obs. of  10 variables:

 $ Year                        : int  1954 1954 1954 1954 1954 1954 1955 1955 1955 1955 …

 $ Month                       : chr  “07” “08” “09” “10” …

 $ Day                         : chr  “01” “01” “01” “01” …

 $ Federal Funds Target Rate   : num  NA NA NA NA NA NA NA NA NA NA …

 $ Federal Funds Upper Target  : num  NA NA NA NA NA NA NA NA NA NA …

 $ Federal Funds Lower Target  : num  NA NA NA NA NA NA NA NA NA NA …

 $ Effective Federal Funds Rate: num  0.8 1.22 1.06 0.85 0.83 1.28 1.39 1.29 1.35 1.43 …

 $ Real GDP (Percent Change)   : num  4.6 NA NA 8 NA NA 11.9 NA NA 6.7 …

 $ Unemployment Rate           : num  5.8 6 6.1 5.7 5.3 5 4.9 4.7 4.6 4.7 …

 $ Inflation Rate              : num  NA NA NA NA NA NA NA NA NA NA …

 

 


 

1.) How have interest rates evolved over the last few decades?

#We should first begin with a general time series plot with the year on the x-axis and the effective federal funds rate on the y-axis.

plot(`Effective Federal Funds Rate` ~ `Year`, data = interest_rates, xlab =”Rates per Year”, ylab = “Effective Federal Funds Rate”, main = “Rates over Time”, cex = 2, col = “blue”

lending rate per year#Although we notice the general rise and fall of this lending interest rate, how about the variability of the effective federal funds rate within a single year?  There seem to be monthly observations that change during some years and remain constant for others.  The 1980s look very different than the first part of the 2010s.

model.Year <- lm(`Effective Federal Funds Rate` ~ `Year`, data = interest_rates)

plot(model.Year, which =4)

We can use the Cook’s distance value to see the combined effect of each observation’s leverage and residual values.  To calculate the Cook’s distance, the ith data point is removed from the model and the regression is recalculated.  The Cook’s distance summarizes how much all of the other values in the regression model changed when the ith observation was removed.

Cook's distance values

#The middle observation counts (1980s) are associated with relatively large Cook’s distance values.  Normally the Cook’s distances are also represented by clearly separated vertical lines.  Since we have so many observations, this visual effect is lost.

 >#Are there certain months where the effective rate changes? 

boxplot(`Effective Federal Funds Rate` ~ Month, data = interest_rates, tck = 0.02, xlab = “Month”, ylab=”Effective Federal Funds Rate”, main = “Does the month matter?”, col = c(“darkgreen”, “orangered”))

Does the month matter.png

#The short answer is, no, the months do not matter. The medians are pretty similar, probably just around 4.5 to 5%… and statistically we can prove this, even though the above graph should suffice as an explanation.

> model.Month <- lm(`Effective Federal Funds Rate` ~ Month, data = interest_rates)

> summary(model.Month)

Call:

lm(formula = `Effective Federal Funds Rate` ~ Month, data = interest_rates)

Residuals:

   Min     1Q Median     3Q    Max

-4.914 -2.450 -0.199  1.715 14.261

Coefficients:

            Estimate Std. Error t value Pr(>|t|)   

(Intercept)  4.81905    0.45829  10.515   <2e-16 ***

Month02     -0.05032    0.64812  -0.078    0.938   

Month03      0.04950    0.65073   0.076    0.939   

Month04      0.11886    0.65073   0.183    0.855   

Month05      0.12918    0.65073   0.199    0.843   

Month06      0.18482    0.65073   0.284    0.776   

Month07      0.12841    0.64812   0.198    0.843   

Month08      0.15413    0.64812   0.238    0.812   

Month09      0.15095    0.64812   0.233    0.816   

Month10      0.11063    0.64812   0.171    0.865   

Month11      0.06810    0.64812   0.105    0.916   

Month12      0.06095    0.64812   0.094    0.925   

Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 3.638 on 740 degrees of freedom

  (152 observations deleted due to missingness)

Multiple R-squared:  0.0003318,     Adjusted R-squared:  -0.01453

F-statistic: 0.02233 on 11 and 740 DF,  p-value: 1

 

#None of the p-values associated with the single t-tests are significantly different than zero.  Just in case the 2010s proved unusually steady while other years experienced more oscillations between months, all months associated with years 2010 through 2016 were eliminated from the model and a new regression was made.

> model.Month2 <- lm(`Effective Federal Funds Rate` ~ Month, data = subset(interest_rates, Year <2010))

> summary(model.Month2)

 Call:

lm(formula = `Effective Federal Funds Rate` ~ Month, data = subset(interest_rates,

    Year < 2010))

Residuals:

    Min      1Q  Median      3Q     Max

-5.4213 -2.4667 -0.3437  1.5644 13.5904

Coefficients:

            Estimate Std. Error t value Pr(>|t|)   

(Intercept)  5.48964    0.46005  11.933   <2e-16 ***

Month02     -0.05927    0.65061  -0.091    0.927   

Month03     -0.02182    0.65061  -0.034    0.973   

Month04      0.05545    0.65061   0.085    0.932   

Month05      0.06764    0.65061   0.104    0.917   

Month06      0.13055    0.65061   0.201    0.841   

Month07      0.05644    0.64770   0.087    0.931   

Month08      0.08501    0.64770   0.131    0.896   

Month09      0.08161    0.64770   0.126    0.900   

Month10      0.03626    0.64770   0.056    0.955   

Month11     -0.01178    0.64770  -0.018    0.985   

Month12     -0.02464    0.64770  -0.038    0.970   

Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

 

Residual standard error: 3.412 on 654 degrees of freedom

  (148 observations deleted due to missingness)

Multiple R-squared:  0.0002527,     Adjusted R-squared:  -0.01656

F-statistic: 0.01503 on 11 and 654 DF,  p-value: 1

 

#Nothing changes by excluding the months in these seven years.  So let’s create a model object for years regressed on the effective federal funds rate.  We come to a different conclusion for the year predictor variable.

> model.Year <- lm(`Effective Federal Funds Rate` ~ Year, data = interest_rates)
> summary(model.Year)

Call:

lm(formula = `Effective Federal Funds Rate` ~ Year, data = interest_rates)

 

Residuals:

    Min      1Q  Median      3Q     Max

-5.7058 -2.9800 -0.4617  1.7083 13.9685

Coefficients:

              Estimate Std. Error t value Pr(>|t|)   

(Intercept) 105.963938  13.982013   7.579 1.03e-13 ***

Year         -0.050900   0.007042  -7.228 1.21e-12 ***

Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 3.494 on 750 degrees of freedom

  (152 observations deleted due to missingness)

Multiple R-squared:  0.06512, Adjusted R-squared:  0.06387

F-statistic: 52.24 on 1 and 750 DF,  p-value: 1.212e-12

 


 

2.) How many times is the effective federal funds target rate above the rate of inflation?

> occurrences <- interest_rates$`Effective Federal Funds Rate` > interest_rates$`Inflation Rate`

> summary(occurrences)
   Mode   FALSE    TRUE    NA's 
logical     226     484     194

 

#The FALSE and TRUE observation counts indicate how many times the effective federal funds target rate was larger than the inflation rate.  Whenever data is missing for either the effective federal funds target rate or the inflation rate, a NA will be produced for occurrences.  Therefore we only look at observations with values for both the effective federal funds target and inflation rate.

 247+68

[1] 315

 #When False

 68/315

 

 #When True

 247/315

 

#More often than not, the effective federal funds target rate is higher than the rate of inflation.

#The effective federal funds rate is the interest rate depository institutions charge one another when they lend one another funds.

#Here is more information: https://fred.stlouisfed.org/series/FEDFUNDS

#It makes sense that banking institutions would want to earn interest on the funds loaned.


 

3.) Are real GDP changes, unemployment rates, and inflation rates good variables for predicting the average variation in the inflation rate?

#We can run three single linear regressions and assign them to a model object.  This is done three times below.

> model.GDP <- lm(`Effective Federal Funds Rate` ~ `Real GDP (Percent Change)`, data = interest_rates)

> summary(model.GDP)

Call:

lm(formula = `Effective Federal Funds Rate` ~ `Real GDP (Percent Change)`,

    data = interest_rates)

Residuals:

    Min      1Q  Median      3Q     Max

-5.6613 -2.5303 -0.1314  1.6837 14.7109

Coefficients:

                            Estimate Std. Error t value Pr(>|t|)   

(Intercept)                  5.25102    0.30661  17.126   <2e-16 ***

`Real GDP (Percent Change)` -0.10375    0.06429  -1.614    0.108   

Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

 

Residual standard error: 3.651 on 248 degrees of freedom

  (654 observations deleted due to missingness)

Multiple R-squared:  0.01039, Adjusted R-squared:  0.006402

F-statistic: 2.604 on 1 and 248 DF,  p-value: 0.1078

 

> model.Unemployment <- lm(`Effective Federal Funds Rate` ~ `Unemployment Rate`, data = interest_rates)

> summary(model.Unemployment)

 

Call:

lm(formula = `Effective Federal Funds Rate` ~ `Unemployment Rate`,

    data = interest_rates)

Residuals:

    Min      1Q  Median      3Q     Max

-5.1303 -2.4760 -0.1807  1.7769 14.0607

Coefficients:

                    Estimate Std. Error t value Pr(>|t|)   

(Intercept)          4.40650    0.51960   8.481   <2e-16 ***

`Unemployment Rate`  0.08438    0.08406   1.004    0.316   

Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 3.611 on 750 degrees of freedom

  (152 observations deleted due to missingness)

Multiple R-squared:  0.001341,      Adjusted R-squared:  9.914e-06

F-statistic: 1.007 on 1 and 750 DF,  p-value: 0.3158

 

> model.Inflation.Rate <- lm(`Effective Federal Funds Rate` ~ `Inflation Rate`, data = interest_rates)

> summary(model.Inflation.Rate)

 

Call:

lm(formula = `Effective Federal Funds Rate` ~ `Inflation Rate`,

    data = interest_rates)

Residuals:

    Min      1Q  Median      3Q     Max

-8.0637 -1.6861  0.1715  1.5918  7.7240

Coefficients:

                 Estimate Std. Error t value Pr(>|t|)   

(Intercept)        0.9058     0.1501   6.036 2.55e-09 ***

`Inflation Rate`   1.1139     0.0331  33.647  < 2e-16 ***

— 

Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 2.269 on 708 degrees of freedom

  (194 observations deleted due to missingness)

Multiple R-squared:  0.6152,  Adjusted R-squared:  0.6147

F-statistic:  1132 on 1 and 708 DF,  p-value: < 2.2e-16

 

#The slope coefficient of the predictor variable in the second model is not significant at any reasonable level of significance.  In order words, the slope of the employment rate is not significantly different than zero. 

#The employment rate is not a good predictor variable of the effective federal funds rate.  It should be noted that we do not know if the unemployment rate is record in U-3 or U-6 percentages.  It  is most likely that these percentages are in U-3 since our first observations are in 1954.  The U-6 unemployment rates first began to be calculated in the early 1990s. 

#The unemployment rate does not have a lower bound of 0%.  Frictional unemployment exists in the best economic conditions.

#The real GDP (percent change) is marginally significant if we were to set alpha at 0.10.  This is a questionably large confidence interval to accept.  If we were to build a best subsets model with multiple predictor variables, the alpha to enter and exit could be set higher, possibly 0.15.  We can revisit this later.

#The inflation rate is significant at any reasonable level of significance.  On average, the effective federal funds rate will be 1.1139 times the inflation rate plus 0.9058.  The inflation rate explains 61.52% of the variation in the effective federal funds rate.

#We should see if the real GDP in percentage change terms belongs as a second predictor variable in a model that contains inflation rate as the first predictor variable.  A new linear regression is assigned a new model object and the summary function is performed on that new model object.

> model.MLR <- lm(`Effective Federal Funds Rate` ~ `Inflation Rate` + `Real GDP (Percent Change)`, data = interest_rates)

> summary(model.MLR)

Call:

lm(formula = `Effective Federal Funds Rate` ~ `Inflation Rate` +

    `Real GDP (Percent Change)`, data = interest_rates)

Residuals:

    Min      1Q  Median      3Q     Max

-8.2808 -1.6203  0.1622  1.5577  6.2258

 

Coefficients:

                            Estimate Std. Error t value Pr(>|t|)   

(Intercept)                  0.61688    0.31189   1.978   0.0491 * 

`Inflation Rate`             1.14912    0.05840  19.678   <2e-16 ***

`Real GDP (Percent Change)`  0.05447    0.04223   1.290   0.1983   

Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 2.27 on 233 degrees of freedom

  (668 observations deleted due to missingness)

Multiple R-squared:  0.6275,        Adjusted R-squared:  0.6243

F-statistic: 196.3 on 2 and 233 DF,  p-value: < 2.2e-16

#The adjusted R-squared value does not improve much by adding a second predictor variable.  The p-value for the individual t-test for the slope of the real GDP (percent change) coefficient is also not significantly different than zero at any reasonable level of significance.

#Before settling for the single linear regression model that only contains the inflation rate, let us check the diagnostic plots for the residuals in the model.

> op <- par(mfrow=c(2,2))

> plot(model.Inflation.Rate)

SLR diagonistic plots.jpg

#Just from looking at the first row of plots, we see that the density and variance of the residuals looks good for the first half of the fitted values, but the variance increases and density decreases as fitted values increase.  There is no pattern.  The normality plot also looked decent, but does have an issue in the lower tail.  In the future we may want to consider a transformation or debate whether the model fits well for a subset of inflation rate values.

The Electoral College and the Tidewater Nation


The author of American Nations: A History of the Eleven Rival Regional Cultures of North America, tries to show us why we should not view policy positions as simply “Democrat” or “Republican”.  According to Woodard, we live in a country of 11 nations that form coalitions based upon various issues.  The objective of each nation is to preserve their identity and to be influential in national politics.

 

the-american-nations-today

Woodard (c)2011

The author suggests that contrary to popular notion of the United States being a melting pot, new arrivals either specifically moved to one of the 11 nations because the nation encompassed their values or the newcomers were assimilated, adopting the pre-existing values of a nation.   In this second scenario, the original founders of a community set the framework for that nation and new arrivals conform to or otherwise reinforce that culture.

Colin Woodard also explains that different nations in the United States held different conceptions of democracy.  The Yankeedom nation held the Nordic or Germanic conception of democracy, which encouraged near universal male suffrage.  Yankeedom was founded primarily by middle-class, well-educated Puritans.  Immigrants came in family units and they valued community structure and shared values.  When migrants settled other parts of the United States, they carried these tendencies and traditions with them.  When confronting other nations, such as New Amsterdam, the Midlands, and Greater Appalachia, they sought to impose their Puritanism.

Other nations were founded by deep inequalities.  The Tidewater and Deep South treasured Greek or Roman democratic system, where the existence of slaves coincided with their perception of democracy.   The Greek of Roman democracy exists to benefit the few, allowing a select group of men to become “enlightened” and guide their societies.  This benefit is seen as outweighing the agonies of those enslaved.  They viewed slavery as more humane than the treatment of the urban poor in the northern nations.  They reasoned that at least the slaves had a master that was supposed to care for them.  “Enlightened” Tidewater and Deep South gentry also argued that Yankeedom was a society of shopkeepers, which prevented individuals from becoming educated enough to advance their societies.

The Tidewater and Deep South were also not founded by equal proportions of men and women and tended to support the Royalists back in the United Kingdom.  During the English Civil War, then tended to side with the King.  The Tidewater saw themselves as an extension of the Norman culture while Yankeedom was Anglo-Saxon.  Things changed for the Tidewater when the British Empire sought to homogenize control over their empire.  The King redefined the rights of his British subjects.  Only those living in England had full rights.  This  clarification of who was considered an Englishman did not go over well for the gentry of the Tidewater.

It should be interesting to note that other nations did not value the democratic system at all.  New Netherland (New York) preferred a hegemonic system and hoped to be reabsorbed by the Dutch or British monarchies on several occasions.   Autocracy worked given that citizens showed tolerance towards one another.

It should not be surprising which cultures would support the continued use of the Electoral College system.  The National Constitution Center features a podcast from December 1, 2016 titled “Should we abolish the Electoral College?”.  The two panelists have biographies included on the website.  From this limited information, we might conclude that the one panelist is from either Yankeedom or the Left Coast while the other is from the Tidewater.  Given that Woodward’s theory is correct, both natives and migrants become assimilated by their nations.  In turn, panelists eventually will advocate the ideals of their nations.

This perspective is interesting because “Yankeedom” or “the Left Coast” could be considered “Democrat” in this past election cycle.  They will be on the defensive when faced with the new administration.  The representative from the “Tidewater” may or may not be considered a “Democrat”, but they come from a dying nation.   The Tidewater nation may not exist in the future. The growth of the DC metropolitan area into Maryland and northern Virginia essentially divides this nation.  Incremental growth from the Midlands also reduces its power.  With rising sea levels, the region will also loose geography to the east.  Essentially, the representative from the Tidewater seeks to preserve any formerly established advantage at all costs.

Both panelists introduce us to the history of the Electoral College.  Some of the original founders envisioned the electors to choose the President and Vice President that were most qualified for the position.  Initially the most qualified person would become the President and the second-most qualified would be Vice President.  Electors were supposed to deliberate and select candidates to run for the final election.

The Electoral College was one of the last systems established during the Constitutional Convention.  The framers were concerned about the excesses of democracy and the emergence of demagogues, but showed “haste and fatigue” by the time they got around to the Electoral College.  Modern campaigns were also not envisioned.  Founding fathers thought the President should be determined based on their reputation and history of service, not by their cleverness, or radicalism, during a campaign.

According to Alex, the electoral college was supposed to serve as a nominating board to send candidates to the House of Representatives.  From this cohort we would end up with the best candidate.  However, by the 1820s, the responsibility for narrowing down the candidate list was being usurped from the Electoral College and handed over to the political parties.

During the 19th century a series of reforms were advocated.  Since several nations exist in the same state, district elections were advocated versus the “winner-take-all”. Some also wanted to eliminate human electors.  Andrew Jackson, an Appalachian, was one of these strong advocates for changing the system.

The moderator and President of the National Constitution Center reminds us that when elections are close, the Electoral College provides us with a clear winner.  A series of small differences in certain states are magnified by the electoral system.  In effect, there is “no room for doubt”.

The Tidewater representative suggests that the smaller states look favorably on keeping the Electoral College.  Its existence helps preserve the Federation; all consistences matter.  The Yankeedom or Left Coast representative refutes this idea, starting that two strong advocates for ditching the Electoral College in favor of a popular vote came from small states, Rhode Island and North Dakota.  Candidates do not campaign in these states under the Electoral College system and they probably still would not if we switched to a Popular Election system.

Both representatives do agree that a popular vote system would lead to increased role for the federal government, since national standards for registration and voting would need to be set and enforced.  The Tidewater representative shows deep concern over this possibility.

It is important to put this concern in its proper context.  As previously mentioned, the Tidewater nation is the only one today that is at risk of extinction.  During the expansion of the Deep South, the values of the Tidewater were eroded and made more extreme, especially its policy towards slavery.  Tidewater leaders eventually followed the lead of the Deep South.  The tobacco industry declined in the Tidewater just as the cotton industry became prosperous in the Deep South.  The Deep South was also able to expand west whereas the Tidewater was cut off by a new nation, Greater Appalachia.

american-nations-advancingWoodard (c)2011

The Yankeedom representative tells us that the conception of “democracy” has changed over time.  The Electoral College does not conform with people’s every day notions of democracy.  He uses our gubernatorial and student body elections as classic examples.  In these instances the popular vote installs the new leader.

This argument rests on the belief that all people in the 11 nations share this belief.  We might question if the Deep South uses wealth and race, or if Greater Appalachia uses strength, in place of popular elections as their preferred method for finding a new leader.

The panelists also discuss the geography of states.  The blue oases in red states do not count.  Woodard addresses this issue by analyzing nations at the county level.

They also discuss the implications of a Popular Vote system.  The Tidewater representative reminds us that having “run off elections” creates an entirely different system.  Other “fringe” political parties would have a stronger initiative to enter the contest.  These “fringe” parties would be able to form coalitions and run for a second round.  The Tidewater representative also warns us that with more than two political parties, there would be less of a “moderating” influence.  It is also uncertain if third parties would increase or reduce the emergence of demagogues.  Regardless of how many exist, political parties were not viewed favorably by most of the founding fathers.

Cotton subsidies and total production per county


Cotton subsidies have been decreasing since 2005.  This may be associated with the Dispute Settlement Board of the World Trade Association’s recommendation the United States cease subsidizing upland cotton subsidies.

The subsidies in question were:

i) the export credit guarantees under the GSM 102, GSM 103 and SCGP export credit guarantee programmes in respect of exports of upland cotton and other unscheduled agricultural products supported under the programmes, and in respect of one scheduled product (rice);

(ii) Section 1207(a) of the Farm Security and Rural Investment (FSRI) Act of 2002 providing for user marketing (STEP2) payments to exporters of upland cotton; and

(iii) Section 1207(a) of the FSRI Act of 2002 providing for user marketing (STEP2) payments to domestic users of upland cotton. As for the actionable subsidies the recommendation is that the United States takes appropriate steps to remove the adverse effects of certain subsidies or withdraw these subsidies within six months from the date of adoption of the Panel and Appellate Body reports, i.e. the compliance period expired on 21 September 2005.

 

The aggregate amount of subsidies between 2000 and 2014 is depicted below:

02-12-17-cotton-subsidies-since-2000

The following county maps were produced by the United States Department of Agriculture.  We can see the decrease in both the total counties and the amount of cotton (pima and upland varieties) produced by each county between 2010 and 2015.  During these five years, the direct payments, production flexibility contracts, and counter-cyclical programs are completely phased out.  “Other cotton programs” emerge in 2013 and to a great extent in 2014 (EWG Farm Subsidy Database).

2010-pima-cotton-production

Pima cotton moves out of western Texas and emerges in Arizona.  Kern County in California almost completely stops producing pima cotton.

2015-pima-cotton-production-per-county

 


 

2010-upland-cotton-production

There are major decreases in the amount of upload cotton produced by counties between 2010 and 2015.  In particular, we should note the color shade changes in Arkansas, Louisiana, North Carolina, and South Carolina.

2015-upland-cotton-2015

False Allegations of Voter Fraud


Allegations of voter fraud make great headlines, but are generally false.  These allegations are based on “feelings” instead of data.  The claims are ultimately stoked by the interest of incumbent politicians and political parties seeking to suppress the vote when greater voter turnout does not favor their odds.

The maximum occurrence of voter fraud in the United States between 2000 and 2014 has been calculated at 0.0000031%. 

We typically see the mobilization efforts of political parties.  What we often do not see, are their demobilization efforts.  According to Groarke, both methods are equally important during an election cycle (Groarke 2016).  We see an example of a demobilization effort when voter requirements become more stringent in response to allegations of “voter fraud”.

Groarke studied three campaigns to improve voter turnout and the allegations waged against these efforts in the name of “voter fraud”.  Groarke found that the representative’s tenure and propensity to want more “unreliable voters” in their district influenced their political calculus.

There was an observed difference between northern and southern Democrats in response to each effort.  Some Republicans even initially supported these efforts, but eventually retreated from these positions.

Postcard Registration Bills (1971 – 1976) introduced by Senator Gale McGee (D-WY)

The arguments against the legislation:

  • Registration of unqualified persons
  • Registration of nonexistent persons
  • Postcard registration cost
  • Danger of federal intrusion into election process
  • Nonvoters are naturally uninterested in politics

Minnesota and Maryland served as a case example of how postcard registration improved voter turnout.  Between the two, a single case of fraud was determined during the registration leading up to the 1974 election.  In exchange for that one case of fraud, registration increased by 1.5% from the previous voting period and was a cheap way to increase participation (Ford Foundation 715).

Election Day Registration (~1976) introduced by President Jimmy Carter and VP Mondale

The arguments against the legislation:

  • Endangers the integrity of franchise
  • Serious threats of fraud even if voters showed identification while registering
  • Increased federal regulation and would discourage participation

Minnesota had allowed same day voter registration since 1973.  Between 1972 and 1976, the percentage of the population actively voting increased from 68.4% to 71.4%.  22.9% of these voters took advantage of their ability to register on election day.  In 1976, Minnesota has the highest voter turnout in the nation (Smolka 26).

Groarke once again noticed that the largest opponents of same day election registration were Republicans or Southern Democrats who were incumbent members of Congress within “safe” districts (578-580).

The beginnings of the Motor Voter Law (mid to late 1970s, 1980s)

The arguments against the legislation:

  • Fraud
  • People who really want to vote will find a way (comparing voters in El Salvador to those in the U.S.)

If effective, the Motor Voter Law would have required states to offer Election Day registration, mail registration, and ensure that government agencies offered voter registration and the ability to update existing registration.  The initial movement also did not include provisions for “non-voter purging”.

Although Election Day Registration failed, some states opted to use mail registration and have government agencies assist in registering and updating existing voter information.  The Reagan administration fought Motor Voter Laws with the Hatch Act.  Litigation tied the ability of the government agencies to consistently offer voter registration and updates to existing registration.  This law became known as the “Motor Voter Law” because essentially only motor vehicle agencies were seamlessly incorporating voter registration as part of their processes (582).

In the 1980s, certain portions of the population tended to live in cities while others lived in the suburbs.  This translates into the need for a driver’s license.

Today, the reality is much different than it was in the 1980s.  Unfortunately, this data from the U.S. Census Bureau’s American FactFinder is not readily available for the 1980s.  It would require more time to juxtapose a snapshot of 1980 and 2015.  I could also then see if the changes are statistically significant.

02-02-17-workers-and-drivers

Proponents of the law had to negotiate to win over opponents.  Penalties were included for the alleged fraudulent registration.  Same day registrants were also segregated and further scrutinized before their votes were counted (Groarke 585 – 586).

The mail delivery service is not consistent in all communities.  This became a problem when a mailing purge was suggested to occur every two years.  Voters that did not respond to the mailer would be removed from the voting list.

National Voter Registration Act – (1993) President Clinton and Rep. Al Swift (D-WA)

The Motor Voter Law reemerges and undergoes significant changes.  Election Day registration is dropped and voter list maintenance requirements were added.  It was also no longer considered mandatory for unemployment agencies to offer voter registration.

Some states threw up roadblocks, by requiring two separate registration processes for voting in state versus national elections.

Groarke reminds us that political parties exert equal efforts to mobilize and demobilize potential voters.  Her Table 3 shows how many voter removals there are annually in percentage terms of voting applications.  The purging process has been implemented nationally thanks to the National Voter Registration Act.  During the first year, a name is identified “to be purged”.  If the potential voter has not communicated with voter registration by year two, they are purged from the voting rolls.

02-02-17-the-impact-of-voter-fraud-claims

With the emphasis on “fraud”, we must remind ourselves that those who might want to commit fraud probably do a simple cost versus benefit analysis.    What is the incremental benefit of one vote against the fines and prison sentences if the one fraudulent vote is discovered?  Levitt argues that the incremental vote is not worth the fines and penalties for a rational agent (Levitt 2007).

Instead, we should consider other rationale for mistakes made at the polls.  Some of these possibilities are:

  • Clerical or typographical errors (ie: signing the wrong line or choosing an identical or almost identical name)
  • First and last names or parts of street number addresses are inverted
  • Incomplete written data matches with another person (ie: consider middle initials)
  • Common names are prone to being flagged and purged
  • Certain birthdates are more common than others (ie: Check out this article about the probability of you being born on a given day.)
  • Voters move and can be registered at two addresses, but only vote once.
  • Voters can begin filling out a form on election day, make a mistake, be given another form, and an election official can accidently count the discarded form again.
  • Voters can vote before an election and die by the time the vote is confirmed.
  • The right to vote for felons is not consistent across all states. In some states once you are released, your right to vote is restored.  In others, you must go through a process to regain your right to vote.  In addition, misdemeanor offenders retain the right to vote.
  • “Caging” efforts to purge voters are not always effective. This is a tactic used to see which postcards are returned by the USPS.  Sometimes the potential voter is out of the country or their area is poorly serviced by the USPS.
  • Just because an address is unusual does not mean it is illegitimate. Homeless persons can register their address as the local shelter and business owners can live in the same building as their business.

Levitt calculates overall documented fraud rates in the following states:

  • Missouri: 0.0003%
  • New Hampshire: 0.0000%
  • New Jersey: 0.0002%
  • New York: 0.000009%
  • Wisconsin: 0.0000%

In a Washington Post article Levitt reports 31 incidents of voter fraud in all general, primary, special, and municipal elections from 2000 through 2014 (Levitt 2014).  These 31 incidents occurred out of over 1 billion ballots cast during these 12 years.  Some of fraud allegations have not been fully investigated, which may indicate that they are falsely being flagged as “fraud”.

U.S. state fertilizer indices and growth of factor productivity levels


I use USDA data from 1960 and 2004 to create a brief exploratory analysis about what makes some states more agriculturally productive than others.

Is a higher fertilizer index associated with higher factor productivity levels?

H0: There is no correlation between fertilizer indices and the growth of factor productivity.

Ha: There is a correlation between fertilizer indices and the growth of factor productivity.

 

Both growth of factor productivity levels and fertilizer consumption indices are relative to Alabama in 1998.  Alaska and Hawaii are the only states excluded.

I run the regression analysis for the 1960 and 2004 data.  A small p-value for the 1960 data has us reject the null hypothesis and conclude the alternative hypothesis that a correlation between the two variables exists.  A larger p-value for the 2004 data has us fail to reject the null hypothesis. 

We should note that the fertilizer index variable explains such a small percentage of the variability in the response variable.  The data points are scattered far from the regression line.  We see this by the value of the R-sq value.  Over time, the fertilizer index predictor variable explains even less of the variability in the response variable. 

Regression Analysis: Factor Productivity (1960) versus Fertilizer Indices in 1960

 Analysis of Variance

Source                        DF   Adj SS    Adj MS  F-Value  P-Value

Regression                     1  0.07532  0.075324     7.75    0.008

  Fertilizer Indices in 1960   1  0.07532  0.075324     7.75    0.008

Error                         46  0.44736  0.009725

Total                         47  0.52269

 

Model Summary 

        S    R-sq  R-sq(adj)  R-sq(pred)

0.0986168  14.41%     12.55%       1.83%

  Coefficients

 

Term                          Coef  SE Coef  T-Value  P-Value   VIF

Constant                    0.4969   0.0222    22.40    0.000

Fertilizer Indices in 1960  0.0499   0.0179     2.78    0.008  1.00

  

Regression Equation

 Factor Productivity (1960) = 0.4969 + 0.0499(Fertilizer Indices in 1960)

 

 Fits and Diagnostics for Unusual Observations

            Factor

     Productivity

Obs        (1960)     Fit    Resid  Std Resid

  2        0.7057  0.5104   0.1953       2.02  R         [Arizona]

  4        0.8643  0.6561   0.2082       2.34  R  X      [California]

  8        0.8649  0.5997   0.2652       2.78  R          [Florida]

 33        0.4673  0.6438  -0.1765      -1.94     X        [Ohio]

 

R  Large residual

X  Unusual X

 

 

Regression Analysis: Factor Productivity (2004) versus Fertilizer Indices in 2004

 Analysis of Variance

Source                        DF  Adj SS   Adj MS  F-Value  P-Value

Regression                     1  0.1356  0.13556     2.12    0.152

  Fertilizer Indices in 2004   1  0.1356  0.13556     2.12    0.152

Error                         46  2.9435  0.06399

Total                         47  3.0791

 

Model Summary

       S   R-sq  R-sq(adj)  R-sq(pred)

0.252961  4.40%      2.32%       0.00%

 

Coefficients

Term                          Coef  SE Coef  T-Value  P-Value   VIF

Constant                    1.1049   0.0493    22.39    0.000

Fertilizer Indices in 2004  0.0184   0.0127     1.46    0.152  1.00

 Regression Equation

 Factor Productivity (2004) = 1.1049 + 0.0184(Fertilizer Indices in 2004)

 Fits and Diagnostics for Unusual Observations

     Factor

     Productivity

Obs        (2004)     Fit    Resid  Std Resid

  1        1.7979  1.1305   0.6674       2.67  R     [Alabama]

  2        1.6304  1.1162   0.5142       2.07  R     [Arizona]

  4        1.5297  1.2817   0.2480       1.06     X  [California]

 13        1.3554  1.3211   0.0343       0.15     X  [Iowa]

 47        0.5777  1.1679  -0.5902      -2.36  R     [Wisconsin]

 48        0.5712  1.1103  -0.5391      -2.17  R     [Wyoming]

 

R  Large residual

X  Unusual X

 09-06-16-fitted-line-plot-1

09.06.16 fitted line plot #2.png

 

We could create a prediction interval for the 1960 data, but the low R-sq value indicates that this interval will be wider than desired.

We may want to include other variables in the linear regression to see if we can better capture the changes of variability of the response variable.

 Data for this brief exploratory analysis was gathered from the USDA website, specifically this page: http://www.ers.usda.gov/data-products/agricultural-productivity-in-the-us.aspx#28268.

 

 

 

 

 

Sugarcane cane smut (“carvão da cana-de-açúcar”) susceptibility


I haven’t posted for a long time.  A lot of new things have happened within the last four months, forcing me to stop writing temporarily.

I began a graduate program in applied statistics two months ago.  Below is a very basic example of some experimental learning with R.

07.00.16 Sugarcane smut R exploration

 

¿Cómo se distribuye la población de internet?


Por este medio me gustaría compartir lo que he leído de este blog mexicano.

Véase: ¿Cómo se distribuye la población de internet?

As faculdades brasileiras hierarquizadas


Agora eu tenho oitenta rascunhos incompletes que ficam aqui com a poeira acumulando acima deles.  Este rascunho não vai ser apenas outro rascunho esquecido!

Aqui só quero deixar um linkspara um site que hierarquiza as faculdades brasilieras por suas especializações.  Isso servirá mais como um recurso para mim mesmo no fúturo.

A classificação das faculdades: eis o site.

Entretanto ainda estou experimentando com os cursos gratuitos pela Internet.  Será que alguém precisa pagar para realmente levar um curso a sério?  A última vez eu fracassei porque fiz uma tentativa andar sozinho.  Eu não considerei a importância de pertenecer a uma comunidade que se apoia.

Igualmente importante, devido ao fato que o tempo é limitado e eu trabalho de forma renumerada em torno de dez horas por dia (inclusive os fins de semana), terei que reduzir as distracções no trabalho e na minha vida particular.  Também eu terei que fazer mais mudanças para avançar ainda mais.

A arte de tradução nós ensina a ser mais humildes


Anteriormente  eu tenho escrito algo de uma redação do Cláudio Almir Dalbosco da UNICAMP.  Eis quero continuar a explorar os temas introduzidos pelo autor.

Eu estou expondo um pouco mais sobre o segundo pilar nomeado “cidadania universal”. Especificamente, quero escrever mais sobre o exercício de tradução e como ele nos afeta como pessoas.

Dalbosco propõe a ideia de que enquanto nós estamos aprendendo outra linguagem, entendemos nossas próprias falhas caraterísticas. No início sempre vamos ter uma interpretação imperfeita, que por sua vez nos dará uma lição sobre como ser humilde.

Nós nos tornaremos humildes por ter consciência de que, pelo fato de ser humano, cada um de nós encara nossas próprias limitações. Sem correr o risco de errar, nunca poderemos aprender outra linguagem. O processo inteiro é cheio de falhas e desentendimentos. Enquanto aprendemos com nossos erros, também temos que treinar tanto o ato de escutar quanto o de falar.

Em Campinas, comprei um livro, Quase A Mesma Coisa, que discute o ato de tradução. Uma parte do texto descreve a necessidade de estar disciplinado enquanto traduzindo, senão você pode “trair as intenções do texto fonte” (127). É errôneo fazer qualquer tentativa para enriquecer o texto, mesmo que o texto possa ficar mais interessante ou o sentimento expressado possa ser ainda mais aprofundado na segunda linguagem.

Por fim, você tem que estar sempre focado no que você está traduzindo para evitar quaisquer distrações. O aprendizado de outro idioma, e depois a tradução entre dois, requer um foco constante. Este processo está repleto de dúvidas até que a pessoa se sinta confiante em aprender pelo processo de errar.