数据集
我正在尝试预处理https://data.world/covid-19-data-resource-hub/covid-19-case-counts/workspace/file?filename=COVID-19+Cases.csv中的数据。 我用这样的代码将Date列转换为datetime类型
df['Date'] = pd.to_datetime(df['Date'], infer_datetime_format=True)
我要实现的目标
我想每天将数据分组/汇总到“国家/地区”,以保持案例类型。像这样 ouput。
我尝试了下面的代码,并且效果很好,但是太慢了。
dfCountry = pd.DataFrame(columns=['Date', 'Country_Region','Case_Type', 'Cases'])
for date in (df['Date'].unique()):
for country in (df['Country_Region'].unique()):
num_r = df[(df['Date']==str(date)) & (df['Country_Region']==str(country)) & (df['Case_Type']=='Recovered')]['Cases'].sum()
num_d = df[(df['Date']==str(date)) & (df['Country_Region']==str(country)) & (df['Case_Type']=='Deaths')]['Cases'].sum()
num_c = df[(df['Date']==str(date)) & (df['Country_Region']==str(country)) & (df['Case_Type']=='Confirmed')]['Cases'].sum()
num_a = df[(df['Date']==str(date)) & (df['Country_Region']==str(country)) & (df['Case_Type']=='Active')]['Cases'].sum()
dfCountry = dfCountry.append({'Date' : str(date), 'Country_Region' : str(country),'Case_Type':'Recovered', 'Cases': num_r}, ignore_index=True)
dfCountry = dfCountry.append({'Date' : str(date), 'Country_Region' : str(country),'Case_Type':'Deaths', 'Cases': num_d}, ignore_index=True)
dfCountry = dfCountry.append({'Date' : str(date), 'Country_Region' : str(country),'Case_Type':'Confirmed', 'Cases': num_c}, ignore_index=True)
dfCountry = dfCountry.append({'Date' : str(date), 'Country_Region' : str(country),'Case_Type':'Active', 'Cases': num_a}, ignore_index=True)
如何以更快的运行时间产生相同的结果?