我不确定如何用语言表达它,所以让我只是(按字面意思)画出它。
我的数据框如下:
df1:
+-------------------+-------------------+-------------------+--------------+----------+----------+
| Province/State | Country/Region | ObservationDate | Confirmed | Deaths | Recovered|
+-------------------+-------------------+-------------------+--------------+----------+----------+
| Anhui | Mainland China | 2020-01-22 | 1 | 0 | 0 |
| Beijing | Mainland China | 2020-01-22 | 14 | 0 | 0 |
| Chongqing | Mainland China | 2020-01-22 | 6 | 0 | 0 |
| Fujian | Mainland China | 2020-01-22 | 1 | 0 | 0 |
| Guangdong | Mainland China | 2020-01-22 | 26 | 0 | 0 |
+-------------------+-------------------+-------------------+--------------+----------+----------+
我想按国家/地区分组并汇总确认,死亡和恢复的栏。我的期望输出如下:
df_new:
+-------------------+-------------------+--------------+----------+----------+
| Country/Region | ObservationDate | Confirmed | Deaths | Recovered|
+-------------------+-------------------+--------------+----------+----------+
| Mainland China | 2020-01-22 | 547 | 17 | 28 |
| Japan | 2020-01-22 | 2 | 0 | 0 |
| Macau | 2020-01-22 | 1 | 0 | 0 |
| South Korea | 2020-01-22 | 1 | 0 | 0 |
| Taiwan | 2020-01-22 | 1 | 0 | 0 |
+-------------------+-------------------+--------------+----------+----------+
我尝试过:
df_new = df.groupby(by=['ObservationDate','Country/Region'])[['Confirmed', 'Deaths', 'Recovered']].sum()
但是输出就像:
df_new:
+-------------------+-------------------+--------------+----------+----------+
| Country/Region | ObservationDate | Confirmed | Deaths | Recovered|
+-------------------+-------------------+--------------+----------+----------+
| Mainland China | 2020-01-22 | 547 | 17 | 28 |
| Japan | | 2 | 0 | 0 |
| Macau | | 1 | 0 | 0 |
| South Korea | | 1 | 0 | 0 |
| Taiwan | | 1 | 0 | 0 |
| Mainland China | 2020-01-23 | 639 | 18 | 30 |
+-------------------+-------------------+--------------+----------+----------+