Introduction

According to the 2016 Annual Division of Criminal Justice Services Performance Report, for the year 2016, crime in New York State reached its lowest point with 375,962 index crimes reported, the fewest since reporting began in 1975. New York is the safest large state among those in the nation with a population of more than 10 million and had the lowest incarceration rate among those states. Based on such a good news, this project conduct a simple analysis to figure out the following questions,

What is the distribution of crime rates of New York State at county level?

What are the contributing factors to crimes in New York State among 2016?

There are many factors that can influence the crime rates, and economy and education may be the significant factors among them. In terms of economy, this project used both income inequality (Gini index, developed by the Italian statistician and sociologist Corrado Gini and published in his 1912 paper Variability and Mutability) and unemployment rate as independent variables that indicate the economic condition. While for education condition, the proportion of adults without degrees higher than high school is taken as the independent variables. Therefore, the project starts with the following hypothesis,

The crime rates are high in those counties with high population density, such as Bronx, New York and Manhattan.

All of the income inequality (Gini index), education attainment, and unemployment rate have significant contributions to crime rates.

This project provides maps for distributions of crime rates, Gini index, education attainment and unemployment rates in New York State and apply generalized additive models to study the significant contributing factor of crime rates.

Materials and methods

R Packages

Loading required packages:

library(dplyr)
library(sp)
library(maptools)
library(gstat)
library(rgdal)
library(raster)
library(tigris)
library(RColorBrewer)

Data

Data Source:

The UCR reporting system collects information on seven crimes classified as Index offenses which are most commonly used to gauge overall crime volume. These include the violent crimes of murder/non-negligent manslaughter, forcible rape, robbery, and aggravated assault; and the property crimes of burglary, larceny, and motor vehicle theft. Firearm counts are derived from taking the number of violent crimes which involve a firearm. Population data are provided every year by the FBI, based on US Census information.

DownLoad Data:

#Download Spatial Data using raster package.
us=getData('GADM', country='USA', level=2) #Download US Boundaries
ny<-subset(us,NAME_1=='New York')          #Subset New York state
ny<-subset(ny,NAME_2!='Lake Ontario')      #Remove Ontario Lake

#Reading Census in the data folder
CensusData <- read.csv("https://raw.githubusercontent.com/penghangliu/RDataScience_Project/master/data/NYCensus2016.csv",stringsAsFactors = FALSE)
#Download Crime data from New York State Gov
CrimeData <- read.csv("https://data.ny.gov/resource/vi5m-jckw.csv",stringsAsFactors = FALSE)

Methods

Data Processing

In this section, the crime data are subsetted to all crime rates, violent crime rates, firearm crime rates and property crime rates, and filtered into year 2016. Spatial data (shapefile of New York State), census data and crime data are merged into

library(DT)
library(leaflet)
library(widgetframe)
library(ggmap)
library(xtable)
library(mgcv)
library(sf)
library(car)

CrimeData <- filter(CrimeData, year==2016)
CrimeData$county[CrimeData$county=='St Lawrence']<-'Saint Lawrence'

data<-left_join(CrimeData,CensusData,by="county")
data_merge <- geo_join(ny, data, "NAME_2", "county") %>% st_as_sf()

Table 1. Crimes, Income Inequality, Education Attainment, Unemployment in New York State, 2016

data1 <- subset(data, select = c(county,index_rate,violent_rate,firearm_rate,property_rate,Gini_Index,under_highschool,unemployment))
datatable(data1)

Data Visualizaion

This section Visualized the distribution of dependent variables(Crime Rates, including all crimes, violent crimes, firearm crimes, and property crimes) and independent variables(Gini Index, Education attainment, Unemployment Rate) over 62 counties in NY State.

Generalized Additive Models

Four Generalized Additive Models are built by setting 4 differnet Crime rates as dependent variables and Gini Index, education attainment, unemployment rate as independent variables. The output will provide coefficients of the independent varables, which provide knowledge for evluating the influence of these fators on Crimes.

Scaling the independent variables

data_merge=mutate(data_merge,Gini_index=as.numeric(scale(Gini_Index)))
data_merge=mutate(data_merge,Under_highschool=as.numeric(scale(under_highschool)))
data_merge=mutate(data_merge,Unemployment=as.numeric(scale(unemployment)))

Building generalized additive models for all crimes, Violent Crimes, Firearm Crimes, and Property Crimes

All <- gam(index_rate~s(Gini_index) + s(Under_highschool) + s(Unemployment), data = data_merge, family = "gaussian")
Violent <- gam(violent_rate~s(Gini_index) + s(Under_highschool) + s(Unemployment), data = data_merge, family = "gaussian")
Firearm <- gam(firearm_rate~s(Gini_index) + s(Under_highschool) + s(Unemployment), data = data_merge, family = "gaussian")
Property <- gam(property_rate~s(Gini_index) + s(Under_highschool) + s(Unemployment), data = data_merge, family ="gaussian")

Results

Distribution of Crime Rates in New York State, 2016

st_transform(data_merge,"+proj=longlat +datum=WGS84")%>%
  leaflet() %>% addTiles() %>%
  addPolygons(label=paste(data_merge$county," (Crime Rates=",data_merge$index_rate,")"),
              group = "All Crimes",
              color = "white", 
              dashArray = "3",
              weight = 1, 
              smoothFactor = 0.5,
              opacity = 1.0, 
              fillOpacity = 0.5,
              fillColor = ~colorQuantile("YlOrRd", index_rate)(index_rate),
              highlightOptions = highlightOptions(color = "white", weight = 3,
                                                  bringToFront = TRUE))%>%
  addPolygons(label=paste(data_merge$county," (Crime Rates=",data_merge$violent_rate,")"),
              group = "Violent Crimes",
              color = "white", 
              dashArray = "3",
              weight = 1, 
              smoothFactor = 0.5,
              opacity = 1.0, 
              fillOpacity = 0.5,
              fillColor = ~colorQuantile("YlOrRd", violent_rate)(violent_rate),
              highlightOptions = highlightOptions(color = "white", weight = 3,
                                                  bringToFront = TRUE))%>%
  addPolygons(label=paste(data_merge$county," (Crime Rates=",data_merge$firearm_rate,")"),
              group = "Firearm Crimes",
              color = "white", 
              dashArray = "3",
              weight = 1, 
              smoothFactor = 0.5,
              opacity = 1.0, 
              fillOpacity = 0.5,
              fillColor = ~colorQuantile("YlOrRd", firearm_rate)(firearm_rate),
              highlightOptions = highlightOptions(color = "white", weight = 3,
                                                  bringToFront = TRUE))%>%
  addPolygons(label=paste(data_merge$county," (Crime Rates=",data_merge$property_rate,")"),
              group = "Property Crimes",
              color = "white", 
              dashArray = "3",
              weight = 1, 
              smoothFactor = 0.5,
              opacity = 1.0, 
              fillOpacity = 0.5,
              fillColor = ~colorQuantile("YlOrRd", property_rate)(property_rate),
              highlightOptions = highlightOptions(color = "white", weight = 3,
                                                  bringToFront = TRUE))%>%
  addPolygons(label=paste(data_merge$county," (Gini Index=",data_merge$Gini_Index,")"),
              group = "Income Inequality",
              color = "white", 
              dashArray = "3",
              weight = 1, 
              smoothFactor = 0.5,
              opacity = 1.0, 
              fillOpacity = 0.5,
              fillColor = ~colorQuantile("YlOrRd", Gini_Index)(Gini_Index),
              highlightOptions = highlightOptions(color = "white", weight = 3,
                                                  bringToFront = TRUE))%>%
  addPolygons(label=paste(data_merge$county," (Under Highschool=",data_merge$under_highschool,"%)"),
              group = "Education Attainment",
              color = "white", 
              dashArray = "3",
              weight = 1, 
              smoothFactor = 0.5,
              opacity = 1.0, 
              fillOpacity = 0.5,
              fillColor = ~colorQuantile("YlOrRd", under_highschool)(under_highschool),
              highlightOptions = highlightOptions(color = "white", weight = 3,
                                                  bringToFront = TRUE))%>%
  addPolygons(label=paste(data_merge$county," (Unemployment Rates=",100*data_merge$unemployment,"%)"),
              group = "Unemployment Rates",
              color = "white", 
              dashArray = "3",
              weight = 1, 
              smoothFactor = 0.5,
              opacity = 1.0, 
              fillOpacity = 0.5,
              fillColor = ~colorQuantile("YlOrRd", unemployment)(unemployment),
              highlightOptions = highlightOptions(color = "white", weight = 3,
                                                  bringToFront = TRUE))%>%
  addLayersControl(
    baseGroups = c("All Crimes", "Violent Crimes", "Firearm Crimes", "Property Crimes", "Income Inequality","Education Attainment","Unemployment Rates"),
    options = layersControlOptions(collapsed = FALSE)
  )%>%
  addMiniMap()%>%
  frameWidget(height = 600)

All crimes model

print(xtable(summary(All)$s.table), type = "html")
edf Ref.df F p-value
s(Gini_index) 2.76 3.36 5.32 0.00
s(Under_highschool) 1.92 2.39 1.31 0.24
s(Unemployment) 3.74 4.59 2.36 0.06

Violent Crimes model

print(xtable(summary(Violent)$s.table), type = "html")
edf Ref.df F p-value
s(Gini_index) 1.00 1.00 25.12 0.00
s(Under_highschool) 3.33 4.07 13.16 0.00
s(Unemployment) 2.16 2.70 2.45 0.08

Firearm Crimes model

print(xtable(summary(Firearm)$s.table), type = "html")
edf Ref.df F p-value
s(Gini_index) 8.26 8.83 5.47 0.00
s(Under_highschool) 1.00 1.00 2.60 0.11
s(Unemployment) 1.00 1.00 0.31 0.58

Property Crimes model

print(xtable(summary(Property)$s.table), type = "html")
edf Ref.df F p-value
s(Gini_index) 2.48 3.03 4.83 0.00
s(Under_highschool) 1.27 1.47 2.51 0.07
s(Unemployment) 3.82 4.69 2.18 0.08

Comparing model

c1 <- c("All Crimes", "Violent Crimes", "Firearm Crimes", "Property Crimes")
c2 <- c(summary(All)$r.sq, summary(Violent)$r.sq, summary(Firearm)$r.sq, summary(Property)$r.sq)
rsq <- as.data.frame(c1,c2)

x <-c("All Crimes", "Violent Crimes", "Firearm Crimes", "Property Crimes")
y <-c(summary(All)$r.sq, summary(Violent)$r.sq, summary(Firearm)$r.sq, summary(Property)$r.sq)
x_name <- "Model"
y_name <- "R-squared"

df <- data.frame(x,y)
names(df) <- c(x_name,y_name)

print(xtable(df), type = "html")
Model R-squared
1 All Crimes 0.26
2 Violent Crimes 0.68
3 Firearm Crimes 0.44
4 Property Crimes 0.21

Conclusions

The crimes rates distribution maps indicate that there are four spatial cluster with high value of crime rates, which are the western part, central part, Eastern part of New York State and New York City.

Taking Gini index, under high school education attainment rate, and unemployment rate as independent variables, the generalized additive model of this project good confidence to fit the violent crimes, with adjusted R squared 0.5077 and relatively low p-value for the independent variables. However, this regression model does not fit with all crimes, firearm crime, as well as property crimes.

According to the generalized additive model, both income inequality (Gini Index) and poor education attainment (under high school rate) have significant positive contribution to the violent crime, while unemployment rate has slight positive contribution to crime rates.

While the generalized additive model does possess the power to explain the violent crime, it has poor ability to estimate other type of crimes. Therefore, this regression model still need to be improve by taking other independent variable into the model.

References

[1]Hsieh, Ching-Chi, and Meredith D. Pugh. “Poverty, income inequality, and violent crime: a meta-analysis of recent aggregate data studies.” Criminal Justice Review 18.2 (1993): 182-202.

[2]Rogerson, Peter. Statistical methods for geography. Sage, 2001.

[3]Morenoff, Jeffrey D., and Robert J. Sampson. “Violent crime and the spatial dynamics of neighborhood transition: Chicago, 1970-1990.” Social forces (1997): 31-64.

[4]Gini, C. W. “Variability and mutability, contribution to the study of statistical distribution and relaitons.” Studi Economico-Giuricici della R (1912).