如何按月获取每个国家/地区的总病例列表?

时间:2020-12-29 00:11:46

标签: python csv data-science heatmap

我想了一个小时,我尝试了很多不同的方法,但是由于这个 csv 文件对我来说太复杂了(我是初学者),我做不到。我需要按月为每个国家/地区提供总案例的多维数组。解决这个问题后,我将使用这些数组代替值来创建热图。

更清楚,例如法国 我需要这样的东西 totalcases_France= (12 月的总病例数,1 月的总病例数,...... 11 月的总病例数) 我需要为每个国家/地区执行此操作并获得多维数组。 ACCESS CSV FILE FROM HERE

1 个答案:

答案 0 :(得分:0)

我不知道你的意思是不是一个月的总和

  • 仅在 12 月的所有案例的总和,
  • 仅在一月份的所有案例的总和,
  • 仅在 2 月份的所有案例的总和, 等

或累计金额

  • 12 月所有案例的总和,
  • 12 月 + 1 月所有案例的总和,
  • 12 月 + 1 月 + 2 月所有病例的总和 等

对于这两种情况,我会将 new cases 列和 pandas 列与 groupby(['location', 'year-month']) 一起使用

首先我需要从 year-month 中创建 year-month-day

df['year-month'] = df['date'].str[:7]

接下来我可以分组

groups = df.groupby(['location', 'year-month'])

并求和 new_cases 以获得仅新案例的总和

df_sum = groups.sum().reset_index()[['location', 'year-month', 'new_cases']]

结果

         location year-month  new_cases
0     Afghanistan    2019-12        0.0
1     Afghanistan    2020-01        0.0
2     Afghanistan    2020-02        1.0
3     Afghanistan    2020-03      140.0
4     Afghanistan    2020-04     1808.0
...           ...        ...        ...
2112     Zimbabwe    2020-07     2518.0
2113     Zimbabwe    2020-08     3320.0
2114     Zimbabwe    2020-09     1425.0
2115     Zimbabwe    2020-10      525.0
2116     Zimbabwe    2020-11      858.0

我可以将它与 cumsum() 一起使用以获得总和/累积和

df_sum['total_cases'] = df_sum.groupby('location')['new_cases'].cumsum()

结果

         location year-month  new_cases  total_cases
0     Afghanistan    2019-12        0.0          0.0
1     Afghanistan    2020-01        0.0          0.0
2     Afghanistan    2020-02        1.0          1.0
3     Afghanistan    2020-03      140.0        141.0
4     Afghanistan    2020-04     1808.0       1949.0
...           ...        ...        ...          ...
2112     Zimbabwe    2020-07     2518.0       3092.0
2113     Zimbabwe    2020-08     3320.0       6412.0
2114     Zimbabwe    2020-09     1425.0       7837.0
2115     Zimbabwe    2020-10      525.0       8362.0
2116     Zimbabwe    2020-11      858.0       9220.0

后来我只能得到一个国家

df_sum[ df_sum['location'] == 'France' ]

df_sum[ df_sum['location'] == 'Germany' ]

结果

    location year-month  new_cases  total_cases
671   France    2019-12        0.0          0.0
672   France    2020-01        6.0          6.0
673   France    2020-02       51.0         57.0
674   France    2020-03    44493.0      44550.0
675   France    2020-04    83892.0     128442.0
676   France    2020-05    23054.0     151496.0
677   France    2020-06    12764.0     164260.0
678   France    2020-07    22313.0     186573.0
679   France    2020-08    91370.0     277943.0
680   France    2020-09   272747.0     550690.0
681   France    2020-10   781294.0    1331984.0
682   France    2020-11   808224.0    2140208.0

    location year-month  new_cases  total_cases
722  Germany    2019-12        0.0          0.0
723  Germany    2020-01        5.0          5.0
724  Germany    2020-02       52.0         57.0
725  Germany    2020-03    61856.0      61913.0
726  Germany    2020-04    97206.0     159119.0
727  Germany    2020-05    22363.0     181482.0
728  Germany    2020-06    12777.0     194259.0
729  Germany    2020-07    14439.0     208698.0
730  Germany    2020-08    33683.0     242381.0
731  Germany    2020-09    46838.0     289219.0
732  Germany    2020-10   229534.0     518753.0
733  Germany    2020-11   410380.0     929133.0

或者我可以使用 groupby(['locations']) 拆分为单独的列表数据框并创建字典 {"France": df_frances, "Germany": df_germany, ...}

data = {}

for country, values in groups:
    data[country] = values

完整代码

import pandas as pd

df = pd.read_csv('ex1.csv')
print(df.columns)

df['year-month'] = df['date'].str[:7]
#print(df['year-month'].head())

groups = df.groupby(['location', 'year-month'])

df_sum = groups.sum().reset_index()[['location', 'year-month', 'new_cases']]
#print(df_sum)

df_sum['total_cases'] = df_sum.groupby('location')['new_cases'].cumsum()
print(df_sum)


print(df_sum[ df_sum['location'] == 'France' ])

print(df_sum[ df_sum['location'] == 'Germany' ])

data = {}

for country, values in groups:
    data[country] = values

print(data)
相关问题