我想了一个小时,我尝试了很多不同的方法,但是由于这个 csv 文件对我来说太复杂了(我是初学者),我做不到。我需要按月为每个国家/地区提供总案例的多维数组。解决这个问题后,我将使用这些数组代替值来创建热图。
更清楚,例如法国 我需要这样的东西 totalcases_France= (12 月的总病例数,1 月的总病例数,...... 11 月的总病例数) 我需要为每个国家/地区执行此操作并获得多维数组。 ACCESS CSV FILE FROM HERE
答案 0 :(得分:0)
我不知道你的意思是不是一个月的总和
或累计金额
对于这两种情况,我会将 new cases
列和 pandas
列与 groupby(['location', 'year-month'])
一起使用
首先我需要从 year-month
中创建 year-month-day
df['year-month'] = df['date'].str[:7]
接下来我可以分组
groups = df.groupby(['location', 'year-month'])
并求和 new_cases
以获得仅新案例的总和
df_sum = groups.sum().reset_index()[['location', 'year-month', 'new_cases']]
结果
location year-month new_cases
0 Afghanistan 2019-12 0.0
1 Afghanistan 2020-01 0.0
2 Afghanistan 2020-02 1.0
3 Afghanistan 2020-03 140.0
4 Afghanistan 2020-04 1808.0
... ... ... ...
2112 Zimbabwe 2020-07 2518.0
2113 Zimbabwe 2020-08 3320.0
2114 Zimbabwe 2020-09 1425.0
2115 Zimbabwe 2020-10 525.0
2116 Zimbabwe 2020-11 858.0
我可以将它与 cumsum()
一起使用以获得总和/累积和
df_sum['total_cases'] = df_sum.groupby('location')['new_cases'].cumsum()
结果
location year-month new_cases total_cases
0 Afghanistan 2019-12 0.0 0.0
1 Afghanistan 2020-01 0.0 0.0
2 Afghanistan 2020-02 1.0 1.0
3 Afghanistan 2020-03 140.0 141.0
4 Afghanistan 2020-04 1808.0 1949.0
... ... ... ... ...
2112 Zimbabwe 2020-07 2518.0 3092.0
2113 Zimbabwe 2020-08 3320.0 6412.0
2114 Zimbabwe 2020-09 1425.0 7837.0
2115 Zimbabwe 2020-10 525.0 8362.0
2116 Zimbabwe 2020-11 858.0 9220.0
后来我只能得到一个国家
df_sum[ df_sum['location'] == 'France' ]
df_sum[ df_sum['location'] == 'Germany' ]
结果
location year-month new_cases total_cases
671 France 2019-12 0.0 0.0
672 France 2020-01 6.0 6.0
673 France 2020-02 51.0 57.0
674 France 2020-03 44493.0 44550.0
675 France 2020-04 83892.0 128442.0
676 France 2020-05 23054.0 151496.0
677 France 2020-06 12764.0 164260.0
678 France 2020-07 22313.0 186573.0
679 France 2020-08 91370.0 277943.0
680 France 2020-09 272747.0 550690.0
681 France 2020-10 781294.0 1331984.0
682 France 2020-11 808224.0 2140208.0
location year-month new_cases total_cases
722 Germany 2019-12 0.0 0.0
723 Germany 2020-01 5.0 5.0
724 Germany 2020-02 52.0 57.0
725 Germany 2020-03 61856.0 61913.0
726 Germany 2020-04 97206.0 159119.0
727 Germany 2020-05 22363.0 181482.0
728 Germany 2020-06 12777.0 194259.0
729 Germany 2020-07 14439.0 208698.0
730 Germany 2020-08 33683.0 242381.0
731 Germany 2020-09 46838.0 289219.0
732 Germany 2020-10 229534.0 518753.0
733 Germany 2020-11 410380.0 929133.0
或者我可以使用 groupby(['locations'])
拆分为单独的列表数据框并创建字典 {"France": df_frances, "Germany": df_germany, ...}
data = {}
for country, values in groups:
data[country] = values
完整代码
import pandas as pd
df = pd.read_csv('ex1.csv')
print(df.columns)
df['year-month'] = df['date'].str[:7]
#print(df['year-month'].head())
groups = df.groupby(['location', 'year-month'])
df_sum = groups.sum().reset_index()[['location', 'year-month', 'new_cases']]
#print(df_sum)
df_sum['total_cases'] = df_sum.groupby('location')['new_cases'].cumsum()
print(df_sum)
print(df_sum[ df_sum['location'] == 'France' ])
print(df_sum[ df_sum['location'] == 'Germany' ])
data = {}
for country, values in groups:
data[country] = values
print(data)