Voting Characteristics in Philadelphia’s Second Ward

The following is from information about the 27 divisions in the 2nd ward of Philadelphia.  As we seek to understand the results of the general presidential election of 2016, it helps to look at the data and trends at the local level.  Whoever best understands what happens locally can help sway the overall results on the state level. 

This data exploration can be enhanced by looking at information from the other wards.  This is a map of all the wards in Philadelphia.  This is the map of the 2nd ward which is subject to this analysis.

02.26.17 Philadelphia 2nd ward.jpg

Below is a summary of the following information from the 27 divisions. 

The original variables is this dataset are:

  • Democrats
  • Republicans
  • Independents
  • Other Party
  • Total Population
  • White
  • Black
  • Hispanic
  • Other Race
  • Male
  • Female
  • Gender Unreported

Other proportion variables will be created when appropriate.

> summary(second[,2:13])
      Dem             Rep              Ind          Other Party      Total Pop.         White      
 Min.   :293.0   Min.   : 41.00   Min.   : 3.000   Min.   : 42.0   Min.   : 379.0   Min.   : 54.0  
 1st Qu.:475.5   1st Qu.: 67.50   1st Qu.: 4.000   1st Qu.: 81.0   1st Qu.: 648.5   1st Qu.:166.0  
 Median :547.0   Median : 80.00   Median : 9.000   Median : 93.0   Median : 711.0   Median :225.0  
 Mean   :552.3   Mean   : 88.96   Mean   : 9.444   Mean   :102.0   Mean   : 752.7   Mean   :211.7  
 3rd Qu.:630.0   3rd Qu.:100.50   3rd Qu.:13.000   3rd Qu.:118.5   3rd Qu.: 846.0   3rd Qu.:246.0  
 Max.   :852.0   Max.   :203.00   Max.   :20.000   Max.   :212.0   Max.   :1252.0   Max.   :405.0  
     Black           Hispanic       Other Race         Male           Female      Gender Unreported
 Min.   :  7.00   Min.   : 6.00   Min.   :10.00   Min.   :122.0   Min.   :148.0   Min.   :112.0    
 1st Qu.: 15.00   1st Qu.:10.00   1st Qu.:20.50   1st Qu.:218.5   1st Qu.:246.5   1st Qu.:163.0    
 Median : 38.00   Median :13.00   Median :24.00   Median :256.0   Median :284.0   Median :193.0    
 Mean   : 58.11   Mean   :13.52   Mean   :27.11   Mean   :261.9   Mean   :288.1   Mean   :205.4    
 3rd Qu.: 66.00   3rd Qu.:16.50   3rd Qu.:30.00   3rd Qu.:311.5   3rd Qu.:317.0   3rd Qu.:241.0    
 Max.   :209.00   Max.   :26.00   Max.   :56.00   Max.   :413.0   Max.   :487.0   Max.   :365.0 

 

Are there associations between some of these variables?


1.)    Does the proportion of female voters in a division help explain the variation in the gross amount of Democratic voters?

A proportion variable is created for the female population in each division.

> second$FemaleProp <- second$Female/second$`Total Pop.`

> summary(lm(second$Dem ~ second$FemaleProp))

Call:

lm(formula = second$Dem ~ second$FemaleProp)
Residuals:

     Min       1Q   Median       3Q      Max

-256.340  -66.267   -1.328   75.866  302.028

Coefficients:

                  Estimate Std. Error t value Pr(>|t|)

(Intercept)          711.4      418.8   1.699    0.102

second$FemaleProp   -415.0     1090.1  -0.381    0.707



Residual standard error: 136.8 on 25 degrees of freedom

Multiple R-squared:  0.005765, Adjusted R-squared:  -0.034

F-statistic: 0.1449 on 1 and 25 DF,  p-value: 0.7066

The percentage of females of each division’s total population does a very bad job explaining the variability in the number of democratic voters in each division.  The coefficient of determination is almost zero and the p-value is very large.

 It might be more appropriate to look at the number of Democrats in each division as a proportion rather than in total persons.

 I make another variable to represent this new proportion.

 

> second$DemProp <- second$Dem/second$`Total Pop.`

> summary(lm(second$DemProp ~ second$FemaleProp))

Call:

lm(formula = second$DemProp ~ second$FemaleProp)

Residuals:

      Min        1Q    Median        3Q       Max 

-0.075952 -0.019556  0.002954  0.029052  0.058959 

 

Coefficients:

                  Estimate Std. Error t value Pr(>|t|)    

(Intercept)         0.6205     0.1146   5.415 1.28e-05 ***

second$FemaleProp   0.3045     0.2983   1.021    0.317    

---

Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

 
Residual standard error: 0.03744 on 25 degrees of freedom

Multiple R-squared:  0.04003, Adjusted R-squared:  0.001632 

F-statistic: 1.043 on 1 and 25 DF,  p-value: 0.317

The p-value halves, but the coefficient of determination is still very low.  The association is not even close to being statistically significant. 


 


 

 

2.)    Is the male proportion of the population indicative of the total Republicans in the same division? 

A new variable is created to represent the male proportion of the population of each division.

> second$MaleProp <- second$Male/second$`Total Pop.`

> summary(lm(second$Rep ~ second$MaleProp))

Call:

lm(formula = second$Rep ~ second$MaleProp)

Residuals:

    Min      1Q  Median      3Q     Max

-54.447 -19.993  -9.245  14.653 108.770

Coefficients:

                Estimate Std. Error t value Pr(>|t|) 

(Intercept)       173.71      89.37   1.944   0.0633 .

second$MaleProp  -243.12     255.59  -0.951   0.3506 

---

Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 36.46 on 25 degrees of freedom

Multiple R-squared:  0.03493, Adjusted R-squared:  -0.003676

F-statistic: 0.9048 on 1 and 25 DF,  p-value: 0.3506

 

There is no statistically significant association between these the proportion of males and the amount of Republicans in a division for the same reasons stated in the first example.

Another new variable is created to represent the Republican proportion of the population of each division.

> second$RepProp <- second$Rep/second$`Total Pop.`

> summary(lm(second$RepProp ~ second$MaleProp))

Call:

lm(formula = second$RepProp ~ second$MaleProp)




Residuals:

      Min        1Q    Median        3Q       Max

-0.049654 -0.013996 -0.004151  0.016322  0.057392




Coefficients:

                Estimate Std. Error t value Pr(>|t|)  

(Intercept)      0.17521    0.06203   2.825  0.00916 **

second$MaleProp -0.16729    0.17740  -0.943  0.35472  

---

Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1



Residual standard error: 0.02531 on 25 degrees of freedom

Multiple R-squared:  0.03435, Adjusted R-squared:  -0.00428

F-statistic: 0.8892 on 1 and 25 DF,  p-value: 0.3547

 

This time the creation of a proportion versus total does almost nothing to change the coefficient of determination and p-value of the regressions.  There is no statistically significant association between the two variables at any reasonable level of significance.

However, we must also consider the quality of the data we have received.

 

Gender Unreported

Min.   :112.0    

1st Qu.:163.0    

Median :193.0    

Mean   :205.4    

3rd Qu.:241.0    

Max.   :365.0 

 

To put this in percentage terms of the total population in each division:

02-26-17-histogram-of-nogender

We do not know the gender of a sizeable percentage of voters in each division!  22% to 32% of each division does not have an identified gender.  We might have found associations between the variables in the first two examples if we had more complete data.


 

3.)    Do populations with a higher proportion of white voters help explain the variation in the amount of Independent party registrants? 

A proportion is created for the white voters over the total population in the division.

> second$WhiteProp <- second$White/second$`Total Pop.`

> summary(lm(second$Ind ~ second$WhiteProp))

Call:

lm(formula = second$Ind ~ second$WhiteProp)



Residuals:

   Min     1Q Median     3Q    Max

-7.465 -3.085  0.759  2.726  9.918



Coefficients:

                 Estimate Std. Error t value Pr(>|t|) 

(Intercept)         2.385      3.917   0.609   0.5480 

second$WhiteProp   25.531     13.746   1.857   0.0751 .

---

Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1



Residual standard error: 4.92 on 25 degrees of freedom

Multiple R-squared:  0.1213,  Adjusted R-squared:  0.08612

F-statistic:  3.45 on 1 and 25 DF,  p-value: 0.07507

We are much closer to finding an association between the proportion of white voters and their tendency to register as Independent versus our attempts to find associations between political parties and genders.  However, the white proportion is still not a statistically significant indicator of independent party registrants if we define alpha at 0.05.  The p-value is found as 0.07507 and the coefficient of determination is very low at 12.13%.

As we did before, we can now create a new object for the percentage of Independent party registrants from the total number of registrants per division.

> second$IndProp <- second$Ind/second$`Total Pop.`

> summary(lm(second$IndProp ~ second$WhiteProp))

Call:

lm(formula = second$IndProp ~ second$WhiteProp)




Residuals:

       Min         1Q     Median         3Q        Max

-0.0081644 -0.0033878 -0.0001117  0.0032048  0.0082518




Coefficients:

                 Estimate Std. Error t value Pr(>|t|) 

(Intercept)      0.004396   0.003759   1.170   0.2532 

second$WhiteProp 0.027473   0.013191   2.083   0.0477 *

---

Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1



Residual standard error: 0.004721 on 25 degrees of freedom

Multiple R-squared:  0.1479,  Adjusted R-squared:  0.1138

F-statistic: 4.338 on 1 and 25 DF,  p-value: 0.04766

 

As a proportion, we find the association to be significant, even though the R-squared value is only 14.79%.  We should now check the residual plots and consider the possibility of adding other variables to the model to improve the coefficient of determination.

op = par(mfrow = c(2,2))> plot(lm(IndProp ~ WhiteProp, data = second))

02-26-17-analysis-for-third-example

The residuals versus fits plot and the normal probability plot look good.  The errors are distributed normally with an approximate mean of zero and constant variance.


 

4.)    Does the total population of a division help explain the variation in the proportion of residents that register as another party other than Republican, Democrat, or “Independent”?

> summary(lm(second$`Other Party`~second$`Total Pop.`))

Call:

lm(formula = second$`Other Party` ~ second$`Total Pop.`)




Residuals:

    Min      1Q  Median      3Q     Max

-30.163 -10.212  -0.958   5.314  39.011




Coefficients:

                     Estimate Std. Error t value Pr(>|t|)   

(Intercept)         -26.74068   12.18433  -2.195   0.0377 * 

second$`Total Pop.`   0.17109    0.01567  10.918 5.29e-11 ***

---

Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1



Residual standard error: 15.87 on 25 degrees of freedom

Multiple R-squared:  0.8266,  Adjusted R-squared:  0.8197

F-statistic: 119.2 on 1 and 25 DF,  p-value: 5.295e-11

The total population does a good job at explaining the variability in the number of individuals that register as “other party”.  The coefficient of determination is larger at 80.97% and the predictor variable is significant at any level of significance.  This is our best result!

> op = par(mfrow = c(2,2))> plot(lm(second$`Other Party`~second$`Total Pop.`))

02-26-17-analysis-for-fourth-example

The residuals seem to bounce randomly about the residual = 0 line, but there are three outliers flagged by R.  These are the divisions 15, 20, and 25.  On our normal probability plot we also see that the errors are normally distributed for middle values, but not for lower and higher values.  We may need to transform the variable and/or consider another regression type besides linear.


5.)    Is there an association between those who register as another party and the amount individuals in a population that identify as white?

summary(lm(second$`Other Party`~second$White))

Call:
lm(formula = second$`Other Party` ~ second$White)
 
Residuals:
    Min      1Q  Median      3Q     Max 
-45.492 -13.703  -3.486  14.109  60.357 
 
Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept)  28.81628   13.79156   2.089    0.047 *  
second$White  0.34592    0.06099   5.672 6.63e-06 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
 
Residual standard error: 25.21 on 25 degrees of freedom
Multiple R-squared:  0.5627,  Adjusted R-squared:  0.5452 
F-statistic: 32.17 on 1 and 25 DF,  p-value: 6.631e-06

The number of those who identify as white does a good job at explaining the variability in the number of individuals that register as “other party”.  The coefficient of determination is 56.27% and we reject the null hypothesis that there is no association at any level of significance.

This is good news.  Below are the results of the residual v. fits and normal probability plot.

02-26-17-analysis-for-fifth-example

Once again there are three outliers.  Division 25 appears again as an outlier, but now we should further examine the data for divisions 6 and 26.  R depicts a pattern in the residuals, or that they do no bounce randomly around the residual = 0 line.  However, we should also consider the possibility that more data would eliminate this slight pattern or appearance of “non-randomness”.

The normal probability plot is once again good for middle values, but loses its utility at lower and higher values for the divisions 6, 25, and 26.  We could try a transformation of this variable, possibly a squared version of the White variable.

Squared and cubed versions of the WhiteProp, White, and TotalPop do not enhance the models once we look at the residuals versus fits and normal probability plots.

We can attempt to regression two predictor variables on the response variable OtherParty.  Since the total population and white variables were both found to be significant individually, we can see if they together can help to explain the variability in the number of other party registrants.

> summary(lm(second$`Other Party`~second$`Total Pop.` + second$White))
Call:
lm(formula = second$`Other Party` ~ second$`Total Pop.` + second$White)
 
Residuals:
    Min      1Q  Median      3Q     Max 
-30.108 -10.240  -0.993   5.313  39.035 
 
Coefficients:
                      Estimate Std. Error t value Pr(>|t|)    
(Intercept)         -2.671e+01  1.276e+01  -2.092   0.0472 *  
second$`Total Pop.`  1.708e-01  2.826e-02   6.045 3.05e-06 ***
second$White         8.462e-04  6.925e-02   0.012   0.9904    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
 
Residual standard error: 16.2 on 24 degrees of freedom
Multiple R-squared:  0.8266,  Adjusted R-squared:  0.8122 
F-statistic: 57.22 on 2 and 24 DF,  p-value: 7.374e-10

When both predictor variables are included in the same model, only the total population is both to be statistically significant.

 

Advertisements

Leave a comment - Deja un comentario - Deixa o seu comentário

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s