我正在使用包含5年的交通崩溃日期的csv。我想估计每个月的平均崩溃次数。这是我的代码:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
df = pd.read_csv('other.csv')
df['Time'] = df['Crash_Date'].str[:-8] + ' ' + df['Crash_Time']
df['Time'] = pd.to_datetime(df['Time'])
df['Crash_Date'] = pd.to_datetime(df['Crash_Date'])
df = df[df.Crash_Date < '2018-01-01 00:00:00']
# Day_Number : Monday=0, Saturday=5, Sunday=6
df['Day_Number'] = df['Crash_Date'].dt.dayofweek
df = df[df.Sig_ID != 0]
#function to estimate the average crash number for each month
def month_crash(x):
t = 0
for date in df['Crash_Date']:
if date.month == x:
t = t + 1
y = t/5
return y
#create a fataframe to save result
month = []
newcrash = []
for i in range(1,13):
month.append(i)
newcrash.append(month_crash(i))
month_crash = pd.DataFrame(
{'Month': month,
'Crash': crash
})
这是我的数据: enter image description here 但是,每次我运行此代码时,都会遇到“分配前引用本地变量'y'” 问题。我在此代码上尝试了其他崩溃数据集,效果很好。所以我不知道问题出在哪里。有人可以帮助我吗?非常感谢!