Exploration of DUI Rates Relative to Transit Options

Does the presence of a Metro line impact rates of DUI?

Joseph Regan Hutson





Hypothesis: Segments of Freeway with a Metro line that could be used as an alternative to driving will have lower rates of DUI incidents.

Process:

  1. Identify Sources of DUI Arrest Data
  2. Connect DUI Arrests to Geographic Location
  3. Select Freeway Segments to Compare
  4. Get Data for Freeways and Ramps
  5. Create buffer zones around selected freeway segments
  6. Select DUI Arrests that fall within each buffer
  7. Calculate total arrests within each buffer
  8. Normalize total as DUI's/1000 Cars

City of Los Angeles and LA County both post arrest data on their open data portals.

Availability of data from other cities is inconsistent.

Targeting freeway segments within the City of Los Angeles will allow both available datasets to be used since both agencies have jurisdiction there.

Data from California Highway Patroll was mixed with data from other agencies and therefore omitted

Freeway Segments to Consider:

With Metro Line:

  1. 110 between Downtown LA and Pasadena - Gold Line
  2. 10 between Downtown LA and Santa Monica - Expo Line
  3. 101 between Downtown LA and North Hollywood - Red Line #### Without Metro Line:
  4. 405 Sepulveda Pass
  5. 118 between 405 and Ventura County Line
  6. 5 between Downtown LA and the 134
In [1]:
# Import Python Libraries that will be used.
import pandas as pd
import geopandas
import numpy as np
import matplotlib as plt
import geocoder
import re
import datetime
%matplotlib inline

How to filter DUI incidents out of general arrest data.

DUI Code list is compiled by searching for DMV violation codes (https://www.dmv.ca.gov/portal/dmv/detail/dl/vioptct) as well as references to "Alcohol", "DUI" and "Influence" in charge descriptions.

In [2]:
dui_code_list = ['23152(G)VC', '191.5(B)PC', '23152(C)', '23550.5AVC', '23152(B)', '23152(D)', '23152(F)', '23224(B)VC', '23550(A)VC', '23152(F)VC', '655(B)HN', '23105(A)VC', '23175(A)VC', '23136(A)VC', '23152(A)', '14601.2B.VC', '23152(B)VC', '23153(E)VC', '23153(A)', '21200.5VC', '23221VC', '23152(E)VC', '23153(B)', '191.5(A)PC', '23550VC', '23153(G)VC', '23153(B)VC', '23152(A)VC', '655(B)HNC', '192(C)(3)PC', '14601.2A.VC', '23153(A)VC', '23140(A)VC', '23153(F)VC', '23152(G)']

LA County Sheriff's Data

In [3]:
# Datasets are ommitted from the Repo due to size.
# They are available at:
# https://data.lacounty.gov/Public-Safety/LASD-Officer-Contacts-Incident-Details/mwug-nk2r
# https://data.lacounty.gov/Public-Safety/LASD-Officer-Contacts-Person-Details/wraf-ix3i
# Import County Data
person_details = pd.read_csv("./datasets/CountyOfLA/LASD_Officer_Contacts_-_Person_Details.csv")
# Import incident details with parse_dates enabled.
incident_details = pd.read_csv("./datasets/CountyOfLA/LASD_Officer_Contacts_-_Incident_Details.csv", parse_dates = [2], infer_datetime_format = True)
/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/IPython/core/interactiveshell.py:3058: DtypeWarning: Columns (6,53,54) have mixed types. Specify dtype option on import or set low_memory=False.
  interactivity=interactivity, compiler=compiler, result=result)
In [4]:
# Person Details contains the offence code
person_details.head()
Out[4]:
Contact ID Person ID Street # Street Direction Street Name Street Type Suite Cross Street Landmark Full Address ... Result of Contact: Custodial Arrest Pursuant to Outstanding Warrant ? Result of Contact: Custodial Arrest Without Warrant? Result of Contact: Custodial Arrest Without Warrant Offense Codes? Result of Contact: Field Interview Card Completed? Result of Contact: Noncriminal Transport or Care Taking Transport? Result of Contact: Contacted Parent/Legal Guardian or other Person Responsible for Minor? Result of Contact: Psychiatric Hold? Result of Contact: Contacted U.S. Department of Homeland Security? Result of Contact: Referral to School Administrator? Result of Contact: Referral to School Counselor or Other Support Staff?
0 102854 114758 NaN NaN NaN NaN NaN PIONEER/ROSECRANS NaN PIONEER/ROSECRANS ... False False NaN False False False False False False False
1 21783 24698 NaN NaN NaN NaN NaN HACIENDA RD/REPOSADO RD NaN HACIENDA RD/REPOSADO RD ... False False NaN False False False False False False False
2 84984 95342 2100.0 NaN HUNTINGTON DR NaN NaN NaN 2100 HUNTINGTON DR ... False False NaN False False False False False False False
3 120881 134469 NaN NaN NaN NaN NaN WOODBURY RD/SANTA ROSA AV NaN WOODBURY RD/SANTA ROSA AV ... False False NaN False False False False False False False
4 57654 65269 NaN NaN NaN NaN NaN 30TH STREET WEST/AVENUE H NaN 30TH STREET WEST/AVENUE H ... False False NaN False False False False False False False

5 rows × 140 columns

In [5]:
# Incident Details contains Date and Time
incident_details.head()
Out[5]:
CONTACT_ID PATROL_STATION DATE_TIME NUMBER_OF_MINUTES STREET_NUMBER DIRECTION STREET TYPE SUITE CROSS_STREET LANDMARK FULL_STREET CITY STATE ZIP_CODE K_12_SCHOOL SCHOOL_NAME CALL_FOR_SERVICE CIVILIANS_CONTACTED
0 12875 WHD 2018-07-01 00:34:00 6 NaN NaN NaN NaN NaN ALMONT DR/MELROSE AV NaN ALMONT DR/MELROSE AV WEST HOLLYWOOD CA 90069.0 False NaN False 1
1 12879 WHD 2018-07-01 00:18:00 6 100.0 NaN ROBERTSON BL NaN NaN NaN 100 ROBERTSON BL WEST HOLLYWOOD CA NaN False NaN False 1
2 12881 WHD 2018-07-01 00:49:00 5 NaN NaN NaN NaN NaN SANTA MONICA BL/DETRIOT ST NaN SANTA MONICA BL/DETRIOT ST WEST HOLLYWOOD CA NaN False NaN False 1
3 12889 SLA 2018-07-01 00:58:00 110 1200.0 W CENTURY BL NaN NaN RED'S LIQUOR 1200 W CENTURY BL WESTMONT / WEST ATHENS CA 90044.0 False NaN False 3
4 12890 ELA 2018-07-01 01:13:00 2 4700.0 NaN 58TH ST 1/2 58TH ST/HELIOTROPE AV NaN 4700 58TH 1/2 ST MAYWOOD CA 90270.0 False NaN True 1
In [6]:
# Join based on Contact ID
sheriff_data = person_details.merge(incident_details,  left_on = 'Contact ID', right_on='CONTACT_ID')
In [7]:
sheriff_data.shape
Out[7]:
(136691, 159)
In [9]:
# The Sheriff's Data contains many more columns than the ones we are interested in.
print(list(sheriff_data.columns))
['Contact ID', 'Person ID', 'Street #', 'Street Direction', 'Street Name', 'Street Type', 'Suite', 'Cross Street', 'Landmark', 'Full Address', 'City', 'State', 'Zip Code', 'Age?', 'K12 Student?', 'Limited or No English Fluency?', 'Perception Made', 'Homeless?', 'Male?', 'Female?', 'Transgender Man?', 'Transgender Woman?', 'Gender Non-Conforming?', 'LGBT?', 'Asian?', 'Black/African American?', 'Hispanic/Latino/Latina?', 'Middle Eastern/South Asian?', 'Native American?', 'Pacific Islander?', 'White?', 'Deaf or Difficulty Hearing?', 'Speech Impaired or Limited Use of Language?', 'Blind or Limited Vision?', 'Mental Health Condition?', 'Intellectually or Developmental Disability?', 'Other Disabilities?', 'Hyperactive or Impulsive Behavior?', 'No Disabilities?', 'Reason for Contact', 'Reason for Contact Narrative', 'Traffic Violation Type', 'Traffic Violation Offense Code', 'Reasonable Suspicion that the Officer Witnessed Commission of a Crime?', 'Reasonable Suspicion that the Person Matched Suspect Description?', 'Reasonable Suspicion that the Person was a Witness or Victim ID of Suspect at the Scene?', 'Reasonable Suspicion that the Person may be Carrying Suspicious Object?', "Reasonable Suspicion that the Person's Actions was Indicative of Casing a Victim or Location?", 'Reasonable Suspicion that the Person was Suspected of Acting as a Lookout?', "Reasonable Suspicion that the Person's Actions was Indicative of a Drug Transaction?", "Reasonable Suspicion that the Person's Actions was Indicative of Engaging in a Violent Crime?", 'Officer had Other Reasonable Suspicions of a Crime?', 'Offense Code of the Reasonable Suspicion', 'Education Code Section', 'Education Code Subdivision', 'Person Removed from Vehicle by Order?', 'Person Removed from Vehicle by Physical Contact?', 'Field Sobriety Test Conducted?', 'Curbside Detention?', 'Person Handcuffed or Flex Cuffed?', 'Patrol Car Detention?', 'Canine Removed from Vehicle or Used to Search?', 'Firearm Pointed at Person?', 'Firearm Discharged or Used?', 'Electronic Control Device Used?', 'Impact Projectile Discharged or Used?', 'Canine Bit or Held Person?', 'Baton or Other Impact Weapon Used?', 'Chemical Spray Used?', 'Other Physical or Vehicle Contact?', 'Person Photographed?', 'Search of Person Conducted?', 'Asked Consent to Search Person?', 'Asked Consent to Search Person Given?', 'Search of Property Conducted?', 'Asked Consent to Search Property?', 'Asked Consent to Search Property Given?', 'Property was Seized?', 'Vehicle Impounded?', 'Admission or Written Statement Obtained from Student?', 'No Action Taken?', 'Search Consent Given?', 'Search Basis: Officer Safety/Safety of Others?', 'Search Basis: Search Warrant?', 'Search Basis: Condition of Parole/Probation/PRCS/Mandatory Supervision?', 'Search Basis: Suspected Weapons?', 'Search Basis: Visible Contraband?', 'Search Basis: Odor of Contraband?', 'Search Basis: Canine Detection?', 'Search Basis: Evidence of Crime?', 'Search Basis: Incident to Arrest?', 'Search Basis: Exigent Circumstances/Emergency?', 'Search Basis: Vehicle inventory?', 'Search Basis: Suspected of Violating of School Policy?', 'Search Basis: Reason', 'Property Seizure Basis: Safekeeping as Allowed by Law/Statute?', 'Property Seizure Basis: Contraband?', 'Property Seizure Basis: Evidence?', 'Property Seizure Basis: Impound of Vehicle?', 'Property Seizure Basis: Abandoned Property?', 'Property Seizure Basis: Suspected of Violating of School Policy?', 'Property Seized: Firearm(s)?', 'Property Seized: Ammunition?', 'Property Seized: Weapon(s) Other than a Firearm?', 'Property Seized: Drug/Narcotics?', 'Property Seized: Alcohol?', 'Property Seized: Money?', 'Property Seized: Drug Paraphernalia?', 'Property Seized: Suspected Stolen Property?', 'Property Seized: Cell Phone(s) or Electronic Device(s)?', 'Property Seized: Vehicle?', 'Property Seized: Other Contraband or Evidence?', 'Contraband Evidence Discovered: None?', 'Contraband Evidence Discovered: Firearm(s)?', 'Contraband Evidence Discovered: Ammunition?', 'Contraband Evidence Discovered: Weapon(s) Other than a Firearm?', 'Contraband Evidence Discovered: Drugs/Narcotics?', 'Contraband Evidence Discovered: Alcohol?', 'Contraband Evidence Discovered: Money?', 'Contraband Evidence Discovered: Drug Paraphernalia?', 'Contraband Evidence Discovered: Suspected Stolen Property?', 'Contraband Evidence Discovered: Cell Phone(s) or Electronic Device(s)?', 'Contraband Evidence Discovered: Other?', 'Result of Contact: No Action?', 'Result of Contact: Warning?', 'Result of Contact: Warning Offense Codes?', 'Result of Contact: Citation for infraction?', 'Result of Contact:Citation for Infraction Offense Codes?', 'Result of Contact:  In-field Cite and Release?', 'Result of Contact:  In-field Cite and Release Offense Codes?', 'Result of Contact: Custodial Arrest Pursuant to Outstanding Warrant ?', 'Result of Contact: Custodial Arrest Without Warrant?', 'Result of Contact: Custodial Arrest Without Warrant Offense Codes?', 'Result of Contact: Field Interview Card Completed?', 'Result of Contact: Noncriminal Transport or Care Taking Transport?', 'Result of Contact: Contacted Parent/Legal Guardian or other Person Responsible for Minor?', 'Result of Contact: Psychiatric Hold?', 'Result of Contact: Contacted U.S. Department of Homeland Security?', 'Result of Contact: Referral to School Administrator?', 'Result of Contact: Referral to School Counselor or Other Support Staff?', 'CONTACT_ID', 'PATROL_STATION', 'DATE_TIME', 'NUMBER_OF_MINUTES', 'STREET_NUMBER', 'DIRECTION', 'STREET', 'TYPE', 'SUITE', 'CROSS_STREET', 'LANDMARK', 'FULL_STREET', 'CITY', 'STATE', 'ZIP_CODE', 'K_12_SCHOOL', 'SCHOOL_NAME', 'CALL_FOR_SERVICE', 'CIVILIANS_CONTACTED']
In [10]:
# Limit Columns to relevant ones
sheriff_data_focused = sheriff_data[['Contact ID','Person ID', 'DATE_TIME','Street #', 'Street Direction', 'Street Name',\
                                     'Street Type', 'Suite', 'Cross Street', 'Landmark', 'Full Address', 'CITY', 'STATE', 'ZIP_CODE',\
                                    'Reason for Contact', 'Reason for Contact Narrative', 'Traffic Violation Type',\
                                    'Traffic Violation Offense Code','Result of Contact:  In-field Cite and Release?',\
                                    'Result of Contact:  In-field Cite and Release Offense Codes?',\
                                    'Result of Contact: Custodial Arrest Pursuant to Outstanding Warrant ?',\
                                    'Result of Contact: Custodial Arrest Without Warrant?', \
                                    'Result of Contact: Custodial Arrest Without Warrant Offense Codes?']]
In [12]:
# Creating empty columns to store date and time as datetime objects.
sheriff_data_focused["Date"] = np.nan
sheriff_data_focused["Time"] = np.nan
/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/ipykernel_launcher.py:2: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  
/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/ipykernel_launcher.py:3: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  This is separate from the ipykernel package so we can avoid doing imports until
In [13]:
# Functions to separate date and time from timestamp
def populate_date(row):
    timestamp = row["DATE_TIME"]
    row["Date"] = pd.to_datetime([str(timestamp.month) + "/" + str(timestamp.day) + "/" + str(timestamp.year)])
    return row

def populate_time(row):
    timestamp = row["DATE_TIME"]
    row["Time"] = ((timestamp.hour * 100) + timestamp.minute)
    return row
In [14]:
sheriff_data_focused = sheriff_data_focused.apply(populate_date, axis = 1)
In [15]:
sheriff_data_focused = sheriff_data_focused.apply(populate_time, axis = 1)
In [18]:
# What is the distribution of all arrests through the day?
sheriff_data_focused.hist(column = "Time", bins = 24)
Out[18]:
array([[<matplotlib.axes._subplots.AxesSubplot object at 0x125c28ad0>]],
      dtype=object)

How Much Data Are We Dealing With Here?

In [19]:
sheriff_data_focused.shape
Out[19]:
(136691, 25)
In [20]:
sheriff_data_focused["DATE_TIME"].max()
Out[20]:
Timestamp('2018-12-31 23:35:00')
In [21]:
sheriff_data_focused["DATE_TIME"].min()
Out[21]:
Timestamp('2018-07-01 00:00:00')

How Much Of That Is Actually Relevant?

In [22]:
# Create empty column to store DUI offence code
sheriff_data_focused["Relevant_Offence_Code"] = np.nan

A new challenge:

Some codes are in pairs

In [23]:
sheriff_data_focused["Result of Contact: Custodial Arrest Without Warrant Offense Codes?"].unique()
Out[23]:
array([nan, '11350(A)', '459', ..., '11378,11357(A)', '459,503',
       '466,496(A)'], dtype=object)
In [24]:
# Store copy of first row for function testing
sheriff_data_test = sheriff_data_focused.iloc[0]
In [25]:
# Function to split combined codes and return in list
def split_combined_codes(string):
    code_list = []
    string = str(string)
    if "," in string:
        split_string = string.split(",")
        code_list = code_list + split_string
    else:
        code_list.append(string)
    return code_list

# Function to check all relevant columns for offence codes
# If one of them is a DUI code it populates the new "Relevant Offense Code" column
def find_dui_offence_codes(row):
    code_list = []
    traffic_code = row["Traffic Violation Offense Code"]
    code_list= code_list + split_combined_codes(traffic_code)
    cite_and_release_code = row["Result of Contact:  In-field Cite and Release Offense Codes?"]
    code_list= code_list + split_combined_codes(cite_and_release_code)
    custodial_arrest_code = row["Result of Contact: Custodial Arrest Without Warrant Offense Codes?"]
    code_list = code_list + split_combined_codes(custodial_arrest_code)
    for code in code_list:
        if code in dui_code_list:
            row["Relevant_Offence_Code"] = code
            return row
        else:
            continue
    return row

Testing looks good.

Nothing in "Relevant Offence Code" but incident was not a DUI

In [26]:
find_dui_offence_codes(sheriff_data_test)
Out[26]:
Contact ID                                                                                                          102854
Person ID                                                                                                           114758
DATE_TIME                                                                                              2018-10-26 10:45:00
Street #                                                                                                               NaN
Street Direction                                                                                                       NaN
Street Name                                                                                                            NaN
Street Type                                                                                                            NaN
Suite                                                                                                                  NaN
Cross Street                                                                                             PIONEER/ROSECRANS
Landmark                                                                                                               NaN
Full Address                                                                                             PIONEER/ROSECRANS
CITY                                                                                                               NORWALK
STATE                                                                                                                   CA
ZIP_CODE                                                                                                               NaN
Reason for Contact                                                                                       Traffic violation
Reason for Contact Narrative                                                                                         tstop
Traffic Violation Type                                                                                    Moving violation
Traffic Violation Offense Code                                                                                    21461(A)
Result of Contact:  In-field Cite and Release?                                                                        True
Result of Contact:  In-field Cite and Release Offense Codes?                                                      21461(A)
Result of Contact: Custodial Arrest Pursuant to Outstanding Warrant ?                                                False
Result of Contact: Custodial Arrest Without Warrant?                                                                 False
Result of Contact: Custodial Arrest Without Warrant Offense Codes?                                                     NaN
Date                                                                     DatetimeIndex(['2018-10-26'], dtype='datetime6...
Time                                                                                                                  1045
Relevant_Offence_Code                                                                                                  NaN
Name: 0, dtype: object
In [27]:
# Apply funtions to the full dataset.
sheriff_data_with_dui_codes = sheriff_data_focused.apply(find_dui_offence_codes, axis = 1)
In [28]:
sheriff_data_with_dui_codes.count()
Out[28]:
Contact ID                                                               136691
Person ID                                                                136691
DATE_TIME                                                                136691
Street #                                                                  40562
Street Direction                                                          10187
Street Name                                                               40562
Street Type                                                               32981
Suite                                                                       879
Cross Street                                                              95512
Landmark                                                                   6305
Full Address                                                             136691
CITY                                                                     136691
STATE                                                                    136691
ZIP_CODE                                                                  24456
Reason for Contact                                                       136691
Reason for Contact Narrative                                             136691
Traffic Violation Type                                                    95470
Traffic Violation Offense Code                                            95470
Result of Contact:  In-field Cite and Release?                           136691
Result of Contact:  In-field Cite and Release Offense Codes?              63258
Result of Contact: Custodial Arrest Pursuant to Outstanding Warrant ?    136691
Result of Contact: Custodial Arrest Without Warrant?                     136691
Result of Contact: Custodial Arrest Without Warrant Offense Codes?        11410
Date                                                                     136691
Time                                                                     136691
Relevant_Offence_Code                                                       841
dtype: int64
In [29]:
# Filter to rows that have data in the "Relevant Offence Code" Column
sheriff_data_with_dui_codes = sheriff_data_with_dui_codes[sheriff_data_with_dui_codes.Relevant_Offence_Code.notnull()]
In [30]:
sheriff_data_with_dui_codes.shape
Out[30]:
(841, 26)

A specific incident might have multiple records.

Ex. A Drunk Driver with Passengers.

In [31]:
# Drop duplicate entries from the same incident
sheriff_data_with_dui_codes.drop_duplicates(subset="Contact ID", keep='first', inplace=True)
sheriff_data_with_dui_codes.shape
Out[31]:
(825, 26)
In [32]:
sheriff_data_with_dui_codes["Date"] = np.nan
sheriff_data_with_dui_codes["Time"] = np.nan
In [33]:
# Functions to separate date and time from timestamp
def populate_date(row):
    timestamp = row["DATE_TIME"]
    row["Date"] = (str(timestamp.month) + "/" + str(timestamp.day) + "/" + str(timestamp.year))
    return row

def populate_time(row):
    timestamp = row["DATE_TIME"]
    row["Time"] = ((timestamp.hour * 100) + timestamp.minute)
    return row
In [34]:
sheriff_data_with_dui_codes = sheriff_data_with_dui_codes.apply(populate_date, axis = 1)
In [35]:
sheriff_data_with_dui_codes = sheriff_data_with_dui_codes.apply(populate_time, axis = 1)
In [36]:
sheriff_data_with_dui_codes.head()
Out[36]:
Contact ID Person ID DATE_TIME Street # Street Direction Street Name Street Type Suite Cross Street Landmark ... Traffic Violation Type Traffic Violation Offense Code Result of Contact: In-field Cite and Release? Result of Contact: In-field Cite and Release Offense Codes? Result of Contact: Custodial Arrest Pursuant to Outstanding Warrant ? Result of Contact: Custodial Arrest Without Warrant? Result of Contact: Custodial Arrest Without Warrant Offense Codes? Date Time Relevant_Offence_Code
342 36786 41829 2018-07-28 19:00:00 1100.0 W I AV NaN NaN NaN ... NaN NaN True 23152(A) False False NaN 7/28/2018 1900 23152(A)
531 106823 119046 2018-11-01 16:05:00 38900.0 NaN YUCCA DR NaN NaN NaN ... NaN NaN True 23152(A) True False NaN 11/1/2018 1605 23152(A)
594 33005 38365 2018-07-24 01:59:00 4900.0 NaN CLARA ST NaN NaN NaN ... Moving violation 23152(A) True 23223(B) False False NaN 7/24/2018 159 23152(A)
610 126017 140095 2018-12-09 15:40:00 NaN NaN NaN NaN NaN CENTRAL AV/GLENN CURTIS AV NaN ... Moving violation 2818 False NaN False True 23152(B) 12/9/2018 1540 23152(B)
906 23122 26212 2018-07-12 10:29:00 NaN NaN NaN NaN NaN 107TH ST/WEIGAND AV NaN ... NaN NaN True 23152(A) False False NaN 7/12/2018 1029 23152(A)

5 rows × 26 columns

In [37]:
#sheriff_data_with_dui_codes_test = sheriff_data_with_dui_codes.iloc[[0]]
print(sheriff_data_with_dui_codes["DATE_TIME"].dtype)
datetime64[ns]

Time distribution of Sherriff's DUI Arrests

In [38]:
sheriff_data_with_dui_codes.hist(column = "Time", bins = 24)
Out[38]:
array([[<matplotlib.axes._subplots.AxesSubplot object at 0x143ca4c10>]],
      dtype=object)

Time distribution of all Sherriff's Arrests

In [39]:
sheriff_data_focused.hist(column = "Time", bins = 24)
Out[39]:
array([[<matplotlib.axes._subplots.AxesSubplot object at 0x143b0dfd0>]],
      dtype=object)

There is clearly a different time distriution for the DUI subset compared to all arrests.

With only 6mo of data we need to determine if the County Data is useful.

Need to geocode to see which fall within the LA City boundary.

In [42]:
# Create columns to store Latitude and Longitude values.
sheriff_data_with_dui_codes["Lat"] = np.nan
sheriff_data_with_dui_codes["Long"] = np.nan
In [43]:
# Latitude and Longitude range for the boundaries of LA County.
# Latitude 33.6 - 34.8
# Longitude -117.6 - -118.9
# Geocoder was returning clearly wrong values for some data, using ranges to confirm that results are reasonable.

def get_lat_long(row, address = "Full Address", city = "CITY", state = "STATE"):
    string_list = []
    if row[address]:
        string_list.append(row[address])
    if row[city]:
        string_list.append(row[city])
    # Specifying LA County resulted in much better results from the Geocoder
    string_list.append("Los Angeles County")
    if row[state]:
        string_list.append(row[state])
    separator = ","
    full_string = separator.join(string_list)
    
    g = geocoder.arcgis(full_string)
    # Get latlng attribute from g
    # latlng is a list with two elements
    lat_lng_list = g.latlng
    #print(lat_lng_list)
    # Create default values for latitude and longitude
    latitude = 0.0
    longitude = 0.0
    # see if value returned by geocoder is within LA County
    # If so, replace default value with it
    if 33.6 <= lat_lng_list[0] <= 35.0:
        latitude = lat_lng_list[0]
    if -119.0 <= lat_lng_list[1] <= -117.6:
        longitude = lat_lng_list[1]
    # Populate the Lat and Long columns in the row
    #latitude = lat_lng_list[0]
    #longitude = lat_lng_list[1]
    row["Lat"] = latitude
    row["Long"] = longitude
    return row
In [44]:
sheriff_data_with_dui_codes = sheriff_data_with_dui_codes.apply(get_lat_long, axis = 1)

The get_lat_long function checked to see if the returned value was within LA County

If an incident wasn't geocoded, the value would still be 0.0

In [46]:
# Look at data using Describe
# Check to be sure the min in Lat and Long show no 0.0 values
sheriff_data_with_dui_codes.describe()
Out[46]:
Contact ID Person ID Street # ZIP_CODE Time Lat Long
count 825.000000 825.000000 266.000000 160.00000 825.000000 825.000000 825.000000
mean 75143.187879 84135.978182 12401.879699 91678.55000 1205.872727 34.235220 -118.193696
std 38143.688626 42129.684839 12673.051048 1265.95645 845.576436 0.320411 0.196705
min 12966.000000 14729.000000 100.000000 90012.00000 0.000000 33.735209 -118.822244
25% 41714.000000 47383.000000 1725.000000 90650.00000 236.000000 33.931938 -118.299351
50% 74216.000000 83514.000000 7650.000000 91372.50000 1458.000000 34.103443 -118.160385
75% 108591.000000 120987.000000 20125.000000 93534.00000 2010.000000 34.577074 -118.080736
max 137356.000000 152504.000000 45800.000000 93552.00000 2359.000000 34.811019 -117.752502
In [47]:
# Pull DUI data into a GeoDataFrame
dui_gdf = geopandas.GeoDataFrame(
    sheriff_data_with_dui_codes, geometry=geopandas.points_from_xy(sheriff_data_with_dui_codes.Long, sheriff_data_with_dui_codes.Lat))
# Convert from Geographic Coordinates to projected
dui_gdf.crs = {'init' :'epsg:4326'}
dui_gdf = dui_gdf.to_crs({'init': 'epsg:2229'})
print(dui_gdf.crs)
# Pull county outline from http://boundaries.latimes.com/sets/
county = geopandas.read_file("http://s3-us-west-2.amazonaws.com/boundaries.latimes.com/archive/1.0/boundary-set/counties-2012.geojson")
county = county.to_crs({'init': 'epsg:2229'})
print(county.crs)
# Load city data from local file
city = geopandas.read_file("./datasets/GISData/City Boundaries.geojson")
city = city.to_crs({'init': 'epsg:2229'})
print(city.crs)
{'init': 'epsg:2229'}
{'init': 'epsg:2229'}
{'init': 'epsg:2229'}
In [48]:
dui_gdf.head()
Out[48]:
Contact ID Person ID DATE_TIME Street # Street Direction Street Name Street Type Suite Cross Street Landmark ... Result of Contact: In-field Cite and Release Offense Codes? Result of Contact: Custodial Arrest Pursuant to Outstanding Warrant ? Result of Contact: Custodial Arrest Without Warrant? Result of Contact: Custodial Arrest Without Warrant Offense Codes? Date Time Relevant_Offence_Code Lat Long geometry
342 36786 41829 2018-07-28 19:00:00 1100.0 W I AV NaN NaN NaN ... 23152(A) False False NaN 7/28/2018 1900 23152(A) 34.703895 -118.149510 POINT (6516731.164 2078577.947)
531 106823 119046 2018-11-01 16:05:00 38900.0 NaN YUCCA DR NaN NaN NaN ... 23152(A) True False NaN 11/1/2018 1605 23152(A) 34.589635 -118.151703 POINT (6516009.062 2036996.650)
594 33005 38365 2018-07-24 01:59:00 4900.0 NaN CLARA ST NaN NaN NaN ... 23223(B) False False NaN 7/24/2018 159 23152(A) 33.964860 -118.180384 POINT (6506969.080 1809642.935)
610 126017 140095 2018-12-09 15:40:00 NaN NaN NaN NaN NaN CENTRAL AV/GLENN CURTIS AV NaN ... NaN False True 23152(B) 12/9/2018 1540 23152(B) 33.849721 -118.248602 POINT (6486180.039 1767785.634)
906 23122 26212 2018-07-12 10:29:00 NaN NaN NaN NaN NaN 107TH ST/WEIGAND AV NaN ... 23152(A) False False NaN 7/12/2018 1029 23152(A) 33.936778 -118.230285 POINT (6491814.279 1799454.160)

5 rows × 29 columns

In [51]:
# City data includes non-Land features.
city['feat_type'].unique()
Out[51]:
array(['Breakwater', 'Pier', 'Water', 'Land', '3 NM Buffer'], dtype=object)
In [52]:
# Limit County Outline to just LA
la_county = county[county["name"] == "Los Angeles"]
# Limit city data to Los Angeles and Land
city_of_los_angeles = city[(city["city_name"] == "Los Angeles") & (city["feat_type"] == "Land")]
In [53]:
#city_of_los_angeles.shape
# Combine multipoligon features into one for comparison
la_boundary = city_of_los_angeles.geometry.unary_union
In [54]:
print(dui_gdf.crs)
{'init': 'epsg:2229'}

Finally we're ready to map something!

In [55]:
ax = la_county.geometry.boundary.plot(figsize=(20, 20),color=None,edgecolor='k',linewidth = 2)

#la_county.geometry.boundary.plot(color=None,edgecolor='k',linewidth = 2,ax=ax)

city_of_los_angeles.geometry.boundary.plot(color=None,edgecolor='lightblue',linewidth = 2,ax=ax)

dui_gdf.plot(figsize=(20, 20), alpha=0.5, edgecolor='green', ax = ax)
Out[55]:
<matplotlib.axes._subplots.AxesSubplot at 0x1436c4d50>
In [56]:
dui_gdf.shape
Out[56]:
(825, 29)

How many of the Sheriff's incidents are actually within the City of LA boundary?

In [57]:
city_dui_gdf = dui_gdf[dui_gdf["geometry"].within(la_boundary)]

#from geopandas.tools import sjoin
#city_dui_gdf = sjoin(dui_gdf, la_boundary, how='left', op='within')

city_dui_gdf.shape
Out[57]:
(39, 29)
In [58]:
ax = la_county.geometry.boundary.plot(figsize=(20, 20),color=None,edgecolor='k',linewidth = 2)

#la_county.geometry.boundary.plot(color=None,edgecolor='k',linewidth = 2,ax=ax)

city_of_los_angeles.geometry.boundary.plot(color=None,edgecolor='lightblue',linewidth = 2,ax=ax)

city_dui_gdf.plot(figsize=(20, 20), alpha=0.5, edgecolor='green', ax = ax)
Out[58]:
<matplotlib.axes._subplots.AxesSubplot at 0x1435a1710>
In [59]:
# Index Cleanup to Avoid issues when comparing to other datasets
city_dui_gdf.reset_index(inplace=True)
# Drop duplicates from Contact ID field since there might be multiple people involved in an incident
city_dui_gdf.drop_duplicates(subset="Contact ID", keep='first', inplace=True)
#print(list(city_dui_gdf["Contact ID"]))
/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/ipykernel_launcher.py:4: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  after removing the cwd from sys.path.
In [60]:
sheriff_dui_data_for_merge = city_dui_gdf[['Date', 'Time', 'Relevant_Offence_Code', 'Lat', 'Long', 'geometry']]
In [61]:
# fill np array with desired string. Use length of df to define array length
#len(list(sheriff_dui_data_for_merge['geometry']))
#sheriff_dui_data_for_merge['Source'] = np.full(len(list(sheriff_dui_data_for_merge['geometry'])), "LA_County_Sheriff").tolist()
#source_column = np.full(len(list(sheriff_dui_data_for_merge['geometry'])), "LA_County_Sheriff")
sheriff_dui_data_for_merge = sheriff_dui_data_for_merge.assign(Source="LA_County_Sheriff")
In [62]:
sheriff_dui_data_for_merge.head()
Out[62]:
Date Time Relevant_Offence_Code Lat Long geometry Source
0 7/12/2018 1029 23152(A) 33.936778 -118.230285 POINT (6491814.279 1799454.160) LA_County_Sheriff
1 8/14/2018 1624 23152(F) 34.255891 -118.615155 POINT (6375783.315 1916074.570) LA_County_Sheriff
2 9/20/2018 205 23152(A) 34.099866 -118.362887 POINT (6451806.723 1858923.584) LA_County_Sheriff
3 9/22/2018 8 23152(A) 34.293358 -118.454233 POINT (6424470.812 1929450.523) LA_County_Sheriff
4 10/31/2018 200 23152(A) 34.102609 -118.369373 POINT (6449846.833 1859929.175) LA_County_Sheriff

Time to look at the LAPD data.

Here we go again!

In [63]:
# Dataset is omitted from the Repository due to size.
# Public download available at:
# https://data.lacity.org/resource/yru6-6re4.json
# Import LAPD data into Data Frame
# Convert text dates to datetime
lapd_raw_data = pd.read_csv("./datasets/CityOfLA/Arrest_Data_from_2010_to_Present.csv",parse_dates = [1], infer_datetime_format = True)
lapd_raw_data.shape
Out[63]:
(1310127, 17)
In [64]:
lapd_raw_data.head()
Out[64]:
Report ID Arrest Date Time Area ID Area Name Reporting District Age Sex Code Descent Code Charge Group Code Charge Group Description Arrest Type Code Charge Charge Description Address Cross Street Location
0 4248313 2015-02-24 1310.0 20 Olympic 2022 37 M H 5.0 Burglary F 459PC BURGLARY 5TH WILTON (34.0653, -118.314)
1 191811472 2019-05-03 1700.0 18 Southeast 1802 23 F B NaN NaN M 653.22 PC NaN 91ST FIGUEROA (33.9543, -118.2827)
2 4254777 2015-02-26 2010.0 19 Mission 1985 22 M H 6.0 Larceny M 459.5PC SHOPLIFTING 8300 VAN NUYS BL NaN (34.2216, -118.4488)
3 5614161 2019-04-29 1040.0 8 West LA 842 41 M H 3.0 Robbery F 211PC ROBBERY 11600 WILSHIRE BL NaN (34.0508, -118.4592)
4 5615197 2019-04-30 615.0 6 Hollywood 663 27 M O 5.0 Burglary F 459PC BURGLARY LA BREA LEXINGTON (34.0907, -118.3384)

What timeframe are we working with?

In [65]:
lapd_raw_data["Arrest Date"].max()
Out[65]:
Timestamp('2019-10-26 00:00:00')
In [66]:
lapd_raw_data["Arrest Date"].min()
Out[66]:
Timestamp('2010-01-01 00:00:00')

How many DUI's in that time?

In [67]:
# Filter data to Charge Codes in dui_code_list
lapd_dui_data = lapd_raw_data[lapd_raw_data["Charge"].isin(dui_code_list)]
In [68]:
lapd_dui_data.shape
Out[68]:
(120251, 17)

What if we limited it to the timeframe of the Sheriff's data?

In [69]:
#Filter Data to same timeframe as Sheriff's data.
lapd_dui_data_during_sheriff = lapd_dui_data[(lapd_dui_data["Arrest Date"]>=datetime.date(2018,7,1))&(lapd_dui_data["Arrest Date"]<= datetime.date(2018,12,31))]
#df[(df['date']>datetime.date(2016,1,1)) & (df['date']<datetime.date(2016,3,1))]
lapd_dui_data_during_sheriff.shape
/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/ipykernel_launcher.py:2: FutureWarning: Comparing Series of datetimes with 'datetime.date'.  Currently, the
'datetime.date' is coerced to a datetime. In the future pandas will
not coerce, and a TypeError will be raised. To retain the current
behavior, convert the 'datetime.date' to a datetime with
'pd.Timestamp'.
  
Out[69]:
(4255, 17)

How much of an impact would the Sheriff's data have?

39/4255 = 0.00916

Less than 1% of DUI Arrests in the City of LA were from LA County Sheriff's during that 6mo period

Let's focus on the LAPD data

There's already a field with Latitude and Longitude so this should be easier.

In [70]:
# Location Column is storing data as text.
# We need to separate the Latitude and Longitude to work with it.
type(lapd_dui_data["Location"].iloc[0])
Out[70]:
str

That was a lovely thought.

The Latitude and Longitude are stored in a single string. We're going to have to separated them and convert them to float values before we can use the data.

In [71]:
# Create new columns to store the values
lapd_dui_data['Lat'] = np.nan
lapd_dui_data['Long'] = np.nan
lapd_dui_data = lapd_dui_data.assign(STATE="California")
lapd_dui_data = lapd_dui_data.assign(Source="Los_Angeles_Police_Department")
lapd_dui_data.head()
/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/ipykernel_launcher.py:2: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  
/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/ipykernel_launcher.py:3: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  This is separate from the ipykernel package so we can avoid doing imports until
Out[71]:
Report ID Arrest Date Time Area ID Area Name Reporting District Age Sex Code Descent Code Charge Group Code ... Arrest Type Code Charge Charge Description Address Cross Street Location Lat Long STATE Source
104 191008773 2019-04-20 1830.0 10 West Valley 1049 28 M H 22.0 ... M 23152(A)VC DRUNK DRIVING ALCOHOL/DRUGS BURBANK BL WOODLEY AV (34.1717, -118.4757) NaN NaN California Los_Angeles_Police_Department
166 5257944 2018-03-15 310.0 7 Wilshire 743 33 F O 1.0 ... F 191.5(B)PC VEH MANSLAUGHTR/OPER VESSEL DUI & DUR FEL OLYMPIC MASSELIN (34.0574, -118.3547) NaN NaN California Los_Angeles_Police_Department
173 4957757 2017-04-10 2320.0 14 Pacific 1441 23 M W 1.0 ... F 191.5(A)PC VEH MANSLAUGHTER DUI W/GROSS NEGLIGENCE WASHINGTON BL OCEAN AV (33.9826, -118.4611) NaN NaN California Los_Angeles_Police_Department
190 5478957 2018-11-22 940.0 8 West LA 827 21 F W 22.0 ... F 23153(A)VC DUI ALCOHOL CAUSING INJURY SAN VICENTE BARRINGTON (34.0536, -118.4655) NaN NaN California Los_Angeles_Police_Department
210 5511360 2019-01-03 116.0 7 Wilshire 717 39 M B 22.0 ... F 23153(A)VC DUI ALCOHOL CAUSING INJURY 400 N LA BREA AV NaN (34.0783, -118.3441) NaN NaN California Los_Angeles_Police_Department

5 rows × 21 columns

In [72]:
# Regex to distinguish Latitude from Longitude in the string.
# This Regex will only work for our dataset since it uses the negative Longitude to help distinguish.
lat = re.compile('\d{2}.\d*')
long = re.compile('-\d*.\d*')

# Checks if Cross Street field is populated
# If so, combines with Address field and returns
# Otherwise returns original row
def combine_address_with_cross(row):
    if pd.notnull(row["Cross Street"]):
        combined_text = str(row["Address"]) + " and " + str(row["Cross Street"])
        row["Address"] = combined_text
        return row
    else:
        return row

# Checks if Location field has a null Lat/Long
# If so, runs combine_address_with_cross then geocodes result
# Returns geocoded row
def geocode_only_nulls(row):
    if row["Location"] == "(0.0, 0.0)":
        updated_row = combine_address_with_cross(row)
        geocoded_row = get_lat_long(updated_row, address = "Address", city = "Area Name", state = "STATE")
        return geocoded_row
    else:
        return row
        
# Takes string location and searches for Lat Long using regex
# Checks to be sure result is within LA County
def split_lat_long(row):
    lat_long_str = row["Location"]
    latitude = lat.search(lat_long_str)
    if latitude:
        if 33.6 <= float(latitude.group()) <= 35.0:
            row["Lat"] = latitude.group()
    else:
        row["Lat"] = "No Match"
    longitude = long.search(lat_long_str)
    if longitude:
        if -119.0 <= float(longitude.group()) <= -117.6:
            row["Long"] = longitude.group()
    else:
        row["Long"] = "No Match"
    return row
In [73]:
lapd_dui_data_with_lat_long = lapd_dui_data.apply(split_lat_long, axis=1)
lapd_dui_data_with_lat_long.head()

#lapd_dui_data_test = lapd_dui_data.iloc[:4]
#print(lapd_dui_data_test)
#lapd_dui_data_with_lat_long = lapd_dui_data_test.apply(split_lat_long, axis=1)
#lapd_dui_data_with_lat_long.head()
Out[73]:
Report ID Arrest Date Time Area ID Area Name Reporting District Age Sex Code Descent Code Charge Group Code ... Arrest Type Code Charge Charge Description Address Cross Street Location Lat Long STATE Source
104 191008773 2019-04-20 1830.0 10 West Valley 1049 28 M H 22.0 ... M 23152(A)VC DRUNK DRIVING ALCOHOL/DRUGS BURBANK BL WOODLEY AV (34.1717, -118.4757) 34.1717 -118.4757 California Los_Angeles_Police_Department
166 5257944 2018-03-15 310.0 7 Wilshire 743 33 F O 1.0 ... F 191.5(B)PC VEH MANSLAUGHTR/OPER VESSEL DUI & DUR FEL OLYMPIC MASSELIN (34.0574, -118.3547) 34.0574 -118.3547 California Los_Angeles_Police_Department
173 4957757 2017-04-10 2320.0 14 Pacific 1441 23 M W 1.0 ... F 191.5(A)PC VEH MANSLAUGHTER DUI W/GROSS NEGLIGENCE WASHINGTON BL OCEAN AV (33.9826, -118.4611) 33.9826 -118.4611 California Los_Angeles_Police_Department
190 5478957 2018-11-22 940.0 8 West LA 827 21 F W 22.0 ... F 23153(A)VC DUI ALCOHOL CAUSING INJURY SAN VICENTE BARRINGTON (34.0536, -118.4655) 34.0536 -118.4655 California Los_Angeles_Police_Department
210 5511360 2019-01-03 116.0 7 Wilshire 717 39 M B 22.0 ... F 23153(A)VC DUI ALCOHOL CAUSING INJURY 400 N LA BREA AV NaN (34.0783, -118.3441) 34.0783 -118.3441 California Los_Angeles_Police_Department

5 rows × 21 columns

Some of the incidents don't have a value in the Location field.

In [16]:
lapd_dui_data_with_null_location = lapd_dui_data_with_lat_long[lapd_dui_data_with_lat_long["Location"] == "(0.0, 0.0)"]
lapd_dui_data_with_null_location.shape
Out[16]:
(187, 21)
In [74]:
# Use the function created above for this situation.
lapd_dui_data_with_lat_long = lapd_dui_data_with_lat_long.apply(geocode_only_nulls, axis = 1)

Final check for any that haven't been geocoded.

In [75]:
# Filter Data Frame to any rows that still have "No Match"
check_for_missing = lapd_dui_data_with_lat_long[(lapd_dui_data_with_lat_long["Lat"] == "No Match") | (lapd_dui_data_with_lat_long["Long"] == "No Match")]
check_for_missing.head()
Out[75]:
Report ID Arrest Date Time Area ID Area Name Reporting District Age Sex Code Descent Code Charge Group Code ... Arrest Type Code Charge Charge Description Address Cross Street Location Lat Long STATE Source

0 rows × 21 columns

In [76]:
check_for_missing = lapd_dui_data_with_lat_long[lapd_dui_data_with_lat_long["Lat"].isnull()]
check_for_missing.head()
Out[76]:
Report ID Arrest Date Time Area ID Area Name Reporting District Age Sex Code Descent Code Charge Group Code ... Arrest Type Code Charge Charge Description Address Cross Street Location Lat Long STATE Source

0 rows × 21 columns

Lat/Long values are still stored as text.

Need to convert them to numeric values

In [77]:
def convert_lat_long_to_float(row):
    row["Lat"] = float(row["Lat"])
    row["Long"] = float(row["Long"])
    return row
In [78]:
# Make sure Latitude and Longitude values are float(numeric) rather than string(text)
lapd_dui_data_with_lat_long = lapd_dui_data_with_lat_long.apply(convert_lat_long_to_float, axis = 1)
lapd_dui_data_with_lat_long.head()
Out[78]:
Report ID Arrest Date Time Area ID Area Name Reporting District Age Sex Code Descent Code Charge Group Code ... Arrest Type Code Charge Charge Description Address Cross Street Location Lat Long STATE Source
104 191008773 2019-04-20 1830.0 10 West Valley 1049 28 M H 22.0 ... M 23152(A)VC DRUNK DRIVING ALCOHOL/DRUGS BURBANK BL WOODLEY AV (34.1717, -118.4757) 34.1717 -118.4757 California Los_Angeles_Police_Department
166 5257944 2018-03-15 310.0 7 Wilshire 743 33 F O 1.0 ... F 191.5(B)PC VEH MANSLAUGHTR/OPER VESSEL DUI & DUR FEL OLYMPIC MASSELIN (34.0574, -118.3547) 34.0574 -118.3547 California Los_Angeles_Police_Department
173 4957757 2017-04-10 2320.0 14 Pacific 1441 23 M W 1.0 ... F 191.5(A)PC VEH MANSLAUGHTER DUI W/GROSS NEGLIGENCE WASHINGTON BL OCEAN AV (33.9826, -118.4611) 33.9826 -118.4611 California Los_Angeles_Police_Department
190 5478957 2018-11-22 940.0 8 West LA 827 21 F W 22.0 ... F 23153(A)VC DUI ALCOHOL CAUSING INJURY SAN VICENTE BARRINGTON (34.0536, -118.4655) 34.0536 -118.4655 California Los_Angeles_Police_Department
210 5511360 2019-01-03 116.0 7 Wilshire 717 39 M B 22.0 ... F 23153(A)VC DUI ALCOHOL CAUSING INJURY 400 N LA BREA AV NaN (34.0783, -118.3441) 34.0783 -118.3441 California Los_Angeles_Police_Department

5 rows × 21 columns

In [79]:
# Convert dataframe to geodataframe
lapd_dui_gdf = geopandas.GeoDataFrame(
    lapd_dui_data_with_lat_long, geometry=geopandas.points_from_xy(lapd_dui_data_with_lat_long.Long, lapd_dui_data_with_lat_long.Lat))

# Convert Lat/Long data to projected coordinates
lapd_dui_gdf.crs = {'init' :'epsg:4326'}
lapd_dui_gdf = lapd_dui_gdf.to_crs({'init': 'epsg:2229'})
In [80]:
# Plot points relative to LA County and City
ax = la_county.geometry.boundary.plot(figsize=(20, 20),color=None,edgecolor='k',linewidth = 2)

city_of_los_angeles.geometry.boundary.plot(color=None,edgecolor='lightblue',linewidth = 2,ax=ax)

lapd_dui_gdf.plot(figsize=(20, 20), alpha=0.5, edgecolor='green', ax = ax)
Out[80]:
<matplotlib.axes._subplots.AxesSubplot at 0x12e0a1950>
In [81]:
# Filter points to just those in the City
lapd_dui_gdf_in_city = lapd_dui_gdf[lapd_dui_gdf["geometry"].within(la_boundary)]
In [82]:
lapd_dui_gdf_in_city.shape
Out[82]:
(118841, 22)
In [83]:
# Plot points relative to LA County and City
ax = la_county.geometry.boundary.plot(figsize=(20, 20),color=None,edgecolor='k',linewidth = 2)

city_of_los_angeles.geometry.boundary.plot(color=None,edgecolor='lightblue',linewidth = 2,ax=ax)

lapd_dui_gdf_in_city.plot(figsize=(20, 20), alpha=0.5, edgecolor='green', ax = ax)
Out[83]:
<matplotlib.axes._subplots.AxesSubplot at 0x130a2abd0>
In [84]:
la_freeways_and_ramps = geopandas.read_file("./datasets/GISData/LosAngelesFreewaysAndRamps.geojson")
In [85]:
la_freeways_and_ramps["geometry"] = la_freeways_and_ramps.simplify(50)
la_freeways_and_ramps.shape
Out[85]:
(10861, 33)
In [86]:
la_freeways_and_ramps.head()
Out[86]:
FullName Type Elevation Surface Status DrivingDir From_L From_R To_L To_R ... LCity_R NameCat_L NameCat_R Accuracy Jurisdicti Source SourceID UpdateDate Shape_len geometry
0 State Route 60 Freeway Surface Paved Unrestricted Two Way 0 0 0 0 ... Los Angeles Secondary Secondary 50.0 UNKNOWN TIGER Ramps And Free 141621109 2015-04-19 476.814434 LINESTRING (6493830.812 1833201.340, 6494307.3...
1 Pomona Frwy Freeway Surface Paved Unrestricted Two Way 0 0 0 0 ... Los Angeles Primary Primary 50.0 UNKNOWN TIGER Ramps And Free 141621109 2015-04-19 476.814434 LINESTRING (6493830.812 1833201.340, 6494307.3...
2 Harbor Frwy Freeway Surface Paved Unrestricted Two Way 0 0 0 0 ... Los Angeles Primary Primary 50.0 UNKNOWN TIGER Ramps And Free 141608003 2015-04-15 555.989656 LINESTRING (6477380.699 1830070.484, 6477119.0...
3 Interstate 110 Freeway Surface Paved Unrestricted Two Way 0 0 0 0 ... Los Angeles Secondary Secondary 50.0 UNKNOWN TIGER Ramps And Free 141608003 2015-04-15 555.989656 LINESTRING (6477380.699 1830070.484, 6477119.0...
4 None Ramp Surface Paved Unrestricted Two Way 0 0 0 0 ... Los Angeles None None 50.0 UNKNOWN TIGER Ramps And Free 638620480 2015-04-19 2031.992434 LINESTRING (6447078.769 1812492.566, 6446052.8...

5 rows × 33 columns

Raw freeway data didn't have a way to connect ramp segments to the freeway.

In [87]:
la_freeways_and_ramps["Type"].unique()
Out[87]:
array(['Freeway', 'Ramp'], dtype=object)
In [88]:
la_freeways_and_ramps.crs = {'init' :'epsg:2229'}
#{'init' :'epsg:4326'}
In [89]:
ax = la_county.geometry.boundary.plot(figsize=(20, 20),color=None,edgecolor='k',linewidth = 2)

city_of_los_angeles.geometry.boundary.plot(color=None,edgecolor='lightblue',linewidth = 2,ax=ax)

la_freeways_and_ramps.plot(figsize=(20, 20), alpha=0.5, edgecolor='green', ax = ax)
Out[89]:
<matplotlib.axes._subplots.AxesSubplot at 0x130e48b90>

Selected segments and created buffers in QGIS.

Buffer based on both freeway and ramps.

Segments distinguished with layer field.

In [90]:
freeway_segments_buffers = geopandas.read_file("./datasets/GISData/150_ft_buffers.geojson")
#freeway_segments_for_analysis.crs = {'init' :'epsg:4326'}
#freeway_segments_for_analysis = freeway_segments_for_analysis.to_crs({'init' :'epsg:2229'})
freeway_segments_buffers.head(6)
Out[90]:
FullName Type Elevation Surface Status DrivingDir From_L From_R To_L To_R ... NameCat_R Accuracy Jurisdicti Source SourceID UpdateDate Shape_len layer path geometry
0 None Ramp Surface Paved Unrestricted Two Way 0 0 0 0 ... None 50.0 UNKNOWN TIGER Ramps And Free 141601629 2015-05-07 571.813145 10_downtown_to_405 /Users/Regan/Documents/GAPython/FinalProject/d... MULTIPOLYGON (((6455658.121 1834521.555, 64556...
1 None Ramp Surface Paved Unrestricted Two Way 0 0 0 0 ... None 50.0 UNKNOWN TIGER Ramps And Free 142694603 2015-04-17 1104.040217 110_downtown_to_pasadena /Users/Regan/Documents/GAPython/FinalProject/d... MULTIPOLYGON (((6495262.178 1852772.515, 64952...
2 None Ramp Surface Paved Unrestricted Two Way 0 0 0 0 ... None 50.0 UNKNOWN TIGER Ramps And Free 142726410 2015-04-18 727.019086 101_downtown_to_north_hollywood /Users/Regan/Documents/GAPython/FinalProject/d... MULTIPOLYGON (((6452152.692 1873306.144, 64521...
3 None Ramp Surface Paved Unrestricted Two Way 0 0 0 0 ... None 50.0 UNKNOWN TIGER Ramps And Free 241175520 2015-04-18 425.358531 118_from_405_to_ventura /Users/Regan/Documents/GAPython/FinalProject/d... MULTIPOLYGON (((6370138.740 1921035.985, 63701...
4 San Diego Frwy Freeway Surface Paved Unrestricted Two Way 0 0 0 0 ... Primary 50.0 UNKNOWN TIGER Ramps And Free 638360597 2015-05-11 1084.054662 405_Sepulvida_Pass /Users/Regan/Documents/GAPython/FinalProject/d... MULTIPOLYGON (((6423415.590 1844911.708, 64233...
5 None Ramp Surface Paved Unrestricted Two Way 0 0 0 0 ... None 50.0 UNKNOWN TIGER Ramps And Free 641451626 2015-04-17 811.138914 5_downtown_to_134 /Users/Regan/Documents/GAPython/FinalProject/d... MULTIPOLYGON (((6492387.909 1852646.864, 64923...

6 rows × 35 columns

In [91]:
ax = la_county.geometry.boundary.plot(figsize=(20, 20),color=None,edgecolor='k',linewidth = 2)

city_of_los_angeles.geometry.boundary.plot(color=None,edgecolor='lightblue',linewidth = 2,ax=ax)

freeway_segments_buffers.plot(figsize=(20, 20), alpha=0.5, edgecolor='green', ax = ax)
Out[91]:
<matplotlib.axes._subplots.AxesSubplot at 0x1307678d0>

Use join so that the segment description is connected to the DUI data.

In [92]:
lapd_duis_in_buffers = geopandas.sjoin(lapd_dui_gdf_in_city, freeway_segments_buffers, how='left', op='within', lsuffix='left', rsuffix='right')
#lapd_duis_in_buffers = lapd_dui_gdf_in_city.sjoin()
#geopandas.sjoin(left_df, right_df, how='inner', op='intersects', lsuffix='left', rsuffix='right')
In [93]:
lapd_duis_in_buffers.shape
Out[93]:
(118841, 57)

Source ID came from the Buffers, so it is null for points that were not within one.

In [94]:
lapd_duis_in_buffers = lapd_duis_in_buffers[lapd_duis_in_buffers["SourceID"].notnull()]
print(lapd_duis_in_buffers.shape)
lapd_duis_in_buffers.head()
(5146, 57)
Out[94]:
Report ID Arrest Date Time Area ID Area Name Reporting District Age Sex Code Descent Code Charge Group Code ... NameCat_L NameCat_R Accuracy Jurisdicti Source_right SourceID UpdateDate Shape_len layer path
1751 5618413 2019-05-03 238.0 8 West LA 805 29 F W 22.0 ... Primary Primary 50.0 UNKNOWN TIGER Ramps And Free 638360597 2015-05-11 1084.054662 405_Sepulvida_Pass /Users/Regan/Documents/GAPython/FinalProject/d...
1782 5618418 2019-05-03 226.0 6 Hollywood 635 28 M H 22.0 ... None None 50.0 UNKNOWN TIGER Ramps And Free 142726410 2015-04-18 727.019086 101_downtown_to_north_hollywood /Users/Regan/Documents/GAPython/FinalProject/d...
3452 5622595 2019-05-07 2255.0 2 Rampart 221 49 M H 22.0 ... None None 50.0 UNKNOWN TIGER Ramps And Free 142726410 2015-04-18 727.019086 101_downtown_to_north_hollywood /Users/Regan/Documents/GAPython/FinalProject/d...
3887 5620486 2019-05-05 510.0 7 Wilshire 782 30 M H 22.0 ... None None 50.0 UNKNOWN TIGER Ramps And Free 141601629 2015-05-07 571.813145 10_downtown_to_405 /Users/Regan/Documents/GAPython/FinalProject/d...
4414 5623897 2019-05-09 351.0 10 West Valley 1079 45 M H 22.0 ... Primary Primary 50.0 UNKNOWN TIGER Ramps And Free 638360597 2015-05-11 1084.054662 405_Sepulvida_Pass /Users/Regan/Documents/GAPython/FinalProject/d...

5 rows × 57 columns

In [95]:
ax = la_county.geometry.boundary.plot(figsize=(20, 20),color=None,edgecolor='k',linewidth = 2)

city_of_los_angeles.geometry.boundary.plot(color=None,edgecolor='lightblue',linewidth = 2,ax=ax)

lapd_duis_in_buffers.plot(figsize=(20, 20), alpha=0.5, edgecolor='green', ax = ax)
Out[95]:
<matplotlib.axes._subplots.AxesSubplot at 0x130744bd0>
In [96]:
lapd_duis_in_buffers.columns
Out[96]:
Index(['Report ID', 'Arrest Date', 'Time', 'Area ID', 'Area Name',
       'Reporting District', 'Age', 'Sex Code', 'Descent Code',
       'Charge Group Code', 'Charge Group Description', 'Arrest Type Code',
       'Charge', 'Charge Description', 'Address', 'Cross Street', 'Location',
       'Lat', 'Long', 'STATE', 'Source_left', 'geometry', 'index_right',
       'FullName', 'Type', 'Elevation', 'Surface', 'Status', 'DrivingDir',
       'From_L', 'From_R', 'To_L', 'To_R', 'Parity_L', 'Parity_R', 'StPreDir',
       'StPreMod', 'StPreType', 'StArticle', 'StName', 'StPostType',
       'StPostDir', 'StPostMod', 'Zip_L', 'Zip_R', 'LCity_L', 'LCity_R',
       'NameCat_L', 'NameCat_R', 'Accuracy', 'Jurisdicti', 'Source_right',
       'SourceID', 'UpdateDate', 'Shape_len', 'layer', 'path'],
      dtype='object')
In [97]:
# Convert back to regular DataFrame
lapd_duis_in_buffers_df = pd.DataFrame(lapd_duis_in_buffers.drop(columns='geometry'))
# Use layer field to group points by freeway segment
lapd_counts_by_segment = lapd_duis_in_buffers_df.groupby(['layer']).count()
# Get count of reports for each segment
lapd_counts_by_segment = lapd_counts_by_segment.sort_values(by=['Report ID'])
lapd_counts_by_segment.head(6)
Out[97]:
Report ID Arrest Date Time Area ID Area Name Reporting District Age Sex Code Descent Code Charge Group Code ... LCity_R NameCat_L NameCat_R Accuracy Jurisdicti Source_right SourceID UpdateDate Shape_len path
layer
118_from_405_to_ventura 166 166 166 166 166 166 166 166 166 166 ... 166 0 0 166 166 166 166 166 166 166
5_downtown_to_134 189 189 189 189 189 189 189 189 189 189 ... 189 0 0 189 189 189 189 189 189 189
110_downtown_to_pasadena 214 214 214 214 214 214 214 214 214 214 ... 214 0 0 214 214 214 214 214 214 214
405_Sepulvida_Pass 494 494 494 494 494 494 494 494 494 494 ... 494 494 494 494 494 494 494 494 494 494
10_downtown_to_405 1180 1180 1180 1180 1180 1180 1180 1180 1180 1180 ... 1180 0 0 1180 1180 1180 1180 1180 1180 1180
101_downtown_to_north_hollywood 2903 2903 2903 2903 2903 2903 2903 2903 2903 2903 ... 2903 0 0 2903 2903 2903 2903 2903 2903 2903

6 rows × 55 columns

Well... That's not what we were expecting.

The sements with a Metro Rail alternative are tending to have the most DUI's

In [98]:
lapd_counts_by_segment["Report ID"].plot(kind='bar')
Out[98]:
<matplotlib.axes._subplots.AxesSubplot at 0x130733c10>

Could something be skewing the data?

Let's look at the relative levels of traffic on each segment.

In [99]:
# Pull traffic count points into GeoDataFrame
traffic_count_points = geopandas.read_file("./datasets/GISData/LosAngelesTrafficCountPoints2017.geojson")
traffic_count_points.crs = {'init' :'epsg:2229'}
In [100]:
traffic_count_points.head()
Out[100]:
OBJECTID District Route County Postmile Description Back_Peak_Hour Back_Peak_Month Back_AADT Ahead_Peak_Hour Ahead_Peak_Month Ahead_AADT geometry
0 28 7 1 LA 9.253 LOS ANGELES, ALAMEDA STREET 3550 39000 32000 3550 40000 32500 MULTIPOINT (6488497.038 1746478.622, 6488508.6...
1 29 7 1 LA 10.532 LOS ANGELES, AVALON BOULEVARD 3550 40000 33000 4350 50000 41000 MULTIPOINT (6481669.343 1746462.826, 6481665.8...
2 30 7 1 LA 11.61 LOS ANGELES, JCT. RTE. 110 4400 51000 41500 4400 59000 56000 MULTIPOINT (6475991.323 1746252.286, 6475983.9...
3 31 7 1 LA 12.171 LOS ANGELES, VERMONT AVENUE 4400 59000 56000 4200 56000 53000 MULTIPOINT (6473032.997 1746429.835, 6473042.7...
4 32 7 1 LA 12.523 LOS ANGELES, NORMANDIE AVENUE 4200 56000 53000 3950 51000 48500 MULTIPOINT (6471237.329 1746032.567, 6471237.8...

AADT = Average Annual Daily Total

Values are separated by direction of traffic

We just want a total for the segment

In [101]:
# Combine traffic counts for each direction into a total count.
traffic_count_points["Total_AADT"] = traffic_count_points["Back_AADT"] + traffic_count_points["Ahead_AADT"]
In [102]:
traffic_count_points.head()
Out[102]:
OBJECTID District Route County Postmile Description Back_Peak_Hour Back_Peak_Month Back_AADT Ahead_Peak_Hour Ahead_Peak_Month Ahead_AADT geometry Total_AADT
0 28 7 1 LA 9.253 LOS ANGELES, ALAMEDA STREET 3550 39000 32000 3550 40000 32500 MULTIPOINT (6488497.038 1746478.622, 6488508.6... 64500
1 29 7 1 LA 10.532 LOS ANGELES, AVALON BOULEVARD 3550 40000 33000 4350 50000 41000 MULTIPOINT (6481669.343 1746462.826, 6481665.8... 74000
2 30 7 1 LA 11.61 LOS ANGELES, JCT. RTE. 110 4400 51000 41500 4400 59000 56000 MULTIPOINT (6475991.323 1746252.286, 6475983.9... 97500
3 31 7 1 LA 12.171 LOS ANGELES, VERMONT AVENUE 4400 59000 56000 4200 56000 53000 MULTIPOINT (6473032.997 1746429.835, 6473042.7... 109000
4 32 7 1 LA 12.523 LOS ANGELES, NORMANDIE AVENUE 4200 56000 53000 3950 51000 48500 MULTIPOINT (6471237.329 1746032.567, 6471237.8... 101500
In [103]:
ax = la_county.geometry.boundary.plot(figsize=(20, 20),color=None,edgecolor='k',linewidth = 2)

city_of_los_angeles.geometry.boundary.plot(color=None,edgecolor='lightblue',linewidth = 2,ax=ax)

traffic_count_points.plot(figsize=(20, 20), alpha=0.5, edgecolor='green', ax = ax)
Out[103]:
<matplotlib.axes._subplots.AxesSubplot at 0x12d490a50>

Select Count Points in each of the buffers

In [104]:
# Join Traffic Count Points to buffers
traffic_counts_in_buffers = geopandas.sjoin(traffic_count_points, freeway_segments_buffers, how='left', op='within', lsuffix='left', rsuffix='right')
traffic_counts_in_buffers = traffic_counts_in_buffers[traffic_counts_in_buffers["layer"].notnull()]
traffic_counts_in_buffers.head()
Out[104]:
OBJECTID District Route County Postmile Description Back_Peak_Hour Back_Peak_Month Back_AADT Ahead_Peak_Hour ... NameCat_L NameCat_R Accuracy Jurisdicti Source SourceID UpdateDate Shape_len layer path
24 282 7 2 LA 12.75 LOS ANGELES, JCT. RTE. 101 3850 58000 56000 3200 ... None None 50.0 UNKNOWN TIGER Ramps And Free 142726410 2015-04-18 727.019086 101_downtown_to_north_hollywood /Users/Regan/Documents/GAPython/FinalProject/d...
28 286 7 2 LA 15.143 LOS ANGELES, JCT. RTE. 5 5400 60000 58000 14600 ... None None 50.0 UNKNOWN TIGER Ramps And Free 641451626 2015-04-17 811.138914 5_downtown_to_134 /Users/Regan/Documents/GAPython/FinalProject/d...
40 565 7 5 LA 21.83 LOS ANGELES, STADIUM WAY 18500 299000 283000 18200 ... None None 50.0 UNKNOWN TIGER Ramps And Free 641451626 2015-04-17 811.138914 5_downtown_to_134 /Users/Regan/Documents/GAPython/FinalProject/d...
41 566 7 5 LA 22.548 LOS ANGELES, JCT. RTE. 2 18200 296000 278000 16000 ... None None 50.0 UNKNOWN TIGER Ramps And Free 641451626 2015-04-17 811.138914 5_downtown_to_134 /Users/Regan/Documents/GAPython/FinalProject/d...
42 567 7 5 LA 23.655 LOS ANGELES, GLENDALE BOULEVARD 16000 239000 233000 14900 ... None None 50.0 UNKNOWN TIGER Ramps And Free 641451626 2015-04-17 811.138914 5_downtown_to_134 /Users/Regan/Documents/GAPython/FinalProject/d...

5 rows × 49 columns

In [105]:
# Convert back to regular dataframe
# Calculate mean values for traffic count points in each buffer
traffic_counts_in_buffers_df = pd.DataFrame(traffic_counts_in_buffers.drop(columns='geometry'))
traffic_count_by_segment = traffic_counts_in_buffers_df.groupby(['layer']).mean()
traffic_count_by_segment.head()
Out[105]:
OBJECTID District Back_Peak_Hour Back_Peak_Month Back_AADT Ahead_Peak_Hour Ahead_Peak_Month Ahead_AADT Total_AADT index_right From_L From_R To_L To_R Accuracy Shape_len
layer
101_downtown_to_north_hollywood 3858.562500 7.0 13734.375000 231687.500000 226312.500000 13650.000000 229250.000000 223656.250000 449968.750000 2.0 0.0 0.0 0.0 0.0 50.0 727.019086
10_downtown_to_405 1380.000000 7.0 19633.333333 288125.000000 279458.333333 20258.333333 298666.666667 288583.333333 568041.666667 0.0 0.0 0.0 0.0 0.0 50.0 571.813145
110_downtown_to_pasadena 4726.000000 7.0 7728.571429 103857.142857 101285.714286 7314.285714 97571.428571 95000.000000 196285.714286 1.0 0.0 0.0 0.0 0.0 50.0 1104.040217
118_from_405_to_ventura 4532.111111 7.0 16355.555556 186333.333333 182888.888889 16533.333333 193777.777778 189888.888889 372777.777778 3.0 0.0 0.0 0.0 0.0 50.0 425.358531
405_Sepulvida_Pass 6359.000000 7.0 17757.142857 275142.857143 263428.571429 17600.000000 266428.571429 256000.000000 519428.571429 4.0 0.0 0.0 0.0 0.0 50.0 1084.054662

Merge DUI Totals per segment with Traffic Average per segment

In [106]:
dui_with_traffic_counts = traffic_count_by_segment.merge(lapd_counts_by_segment, left_on = 'layer', right_on = 'layer')
In [107]:
dui_with_traffic_counts["dui_per_thousand_cars"] = dui_with_traffic_counts["Report ID"]/(dui_with_traffic_counts["Total_AADT"] / 1000)
dui_with_traffic_counts = dui_with_traffic_counts.sort_values(by=['dui_per_thousand_cars'])
dui_with_traffic_counts["dui_per_thousand_cars"]
Out[107]:
layer
5_downtown_to_134                  0.416300
118_from_405_to_ventura            0.445306
405_Sepulvida_Pass                 0.951045
110_downtown_to_pasadena           1.090247
10_downtown_to_405                 2.077312
101_downtown_to_north_hollywood    6.451559
Name: dui_per_thousand_cars, dtype: float64
In [108]:
dui_with_traffic_counts["dui_per_thousand_cars"].plot(kind='bar')
Out[108]:
<matplotlib.axes._subplots.AxesSubplot at 0x12d487d50>

Based on the normalized data we must reject our hypothesis.

It seems more likely that there is a set of common factors that would positively influence both the rate of DUI arrests and the likelyhood for funding to be allocated to transit projects.