Tableau中Datetime Dataset Covid-19中的熊猫数据清洗

时间:2020-03-23 05:31:56

标签: python pandas dataframe datetime aggregate

数据集

我正在尝试预处理https://data.world/covid-19-data-resource-hub/covid-19-case-counts/workspace/file?filename=COVID-19+Cases.csv中的数据。 我用这样的代码将Date列转换为datetime类型

df['Date'] = pd.to_datetime(df['Date'], infer_datetime_format=True)

original data

我要实现的目标

我想每天将数据分组/汇总到“国家/地区”,以保持案例类型。像这样 ouput

我尝试了下面的代码,并且效果很好,但是太慢了。

dfCountry = pd.DataFrame(columns=['Date', 'Country_Region','Case_Type', 'Cases'])
for date in (df['Date'].unique()):
for country in (df['Country_Region'].unique()):
    num_r = df[(df['Date']==str(date)) & (df['Country_Region']==str(country)) & (df['Case_Type']=='Recovered')]['Cases'].sum()
    num_d = df[(df['Date']==str(date)) & (df['Country_Region']==str(country)) & (df['Case_Type']=='Deaths')]['Cases'].sum()
    num_c = df[(df['Date']==str(date)) & (df['Country_Region']==str(country)) & (df['Case_Type']=='Confirmed')]['Cases'].sum()
    num_a = df[(df['Date']==str(date)) & (df['Country_Region']==str(country)) & (df['Case_Type']=='Active')]['Cases'].sum()
    dfCountry = dfCountry.append({'Date' : str(date), 'Country_Region' : str(country),'Case_Type':'Recovered', 'Cases': num_r}, ignore_index=True)
    dfCountry = dfCountry.append({'Date' : str(date), 'Country_Region' : str(country),'Case_Type':'Deaths', 'Cases': num_d}, ignore_index=True)
    dfCountry = dfCountry.append({'Date' : str(date), 'Country_Region' : str(country),'Case_Type':'Confirmed', 'Cases': num_c}, ignore_index=True)
    dfCountry = dfCountry.append({'Date' : str(date), 'Country_Region' : str(country),'Case_Type':'Active', 'Cases': num_a}, ignore_index=True)

如何以更快的运行时间产生相同的结果?

0 个答案:

没有答案