Question

我已经在解决这个问题上做了一点，而且确实很接近。本质上，我想从事件数据库中按类型创建事件计数的时间序列。我真的很近到目前为止，这是我所做的：

从我的数据框的缩写版本开始：

   event_date  year time_precision event_type  \
0  2020-10-24  2020              1    Battles   
1  2020-10-24  2020              1      Riots   
2  2020-10-24  2020              1      Riots   
3  2020-10-24  2020              1    Battles   
4  2020-10-24  2020              2    Battles

我希望时间序列按月和年，所以首先我将日期转换为datetime：

nga_df.event_date = pd.to_datetime(nga_df.event_date)

然后，我想按类型创建事件的时间序列，因此我对它们进行一次热编码：

nga_df = pd.get_dummies(nga_df, columns=['event_type'], prefix='', prefix_sep='')

接下来，我需要提取月份，以便创建每月计数：

nga_df['month'] = nga_df.event_date.apply(lambda x: x.month)

最后，我离这里很近，我按月和年对数据进行分组并进行转置：

conflict_series = nga_df.groupby(['year','month']).sum()
conflict_series.T

这将产生一个可爱的新数据框：

year                       1997                       ...  2020             
month                        1   2   3    4   5   6   ...    5     6    7    
fatalities                   11  30  38  112  17  29  ...  1322  1015  619   
Battles                       4   4   5   13   2   2  ...    77    99   74   
Explosions/Remote violence    2   1   0    0   3   0  ...    38    28   17   
Protests                      1   0   0    1   0   1  ...    31    83   50   
Riots                         3   3   4    1   4   1  ...    27    14   18   
Strategic developments        1   0   0    0   0   0  ...     7     2    7   
Violence against civilians    3   5   7    3   2   1  ...   135   112   88

所以，我想我需要做的是合并索引（转置后的列），以使它们成为单个索引。我该怎么做？

最终目标是将这些数据与经济指标结合起来，看是否存在趋势，因此我需要两个数据集都采用相同的形式，其中各列是不同值的月度计数。

Answer 1

这是我的做法：

第1步：展平索引：

# convert the multi-index to a flat set of tuples: (YYYY, MM)
index = conflict_series.index.to_flat_index().to_series()

步骤2 ：添加任意但必需的月末转换为日期时间：

index = index.apply(lambda x: x + (28,))

步骤3 ：将生成的三元组转换为日期时间：

index = index.apply(lambda x: datetime.date(*x))

步骤4 ：重置DataFrame索引：

conflict_series.set_index(index, inplace=True)

结果：

            fatalities  Battles  Explosions/Remote violence  Protests  Riots  \
1997-01-28          11        4                           2         1      3   
1997-02-28          30        4                           1         0      3   
1997-03-28          38        5                           0         0      4   
1997-04-28         112       13                           0         1      1   
1997-05-28          17        2                           3         0      4   

            Strategic developments  Violence against civilians  total_events  
1997-01-28                       1                           3            14  
1997-02-28                       0                           5            13  
1997-03-28                       0                           7            16  
1997-04-28                       0                           3            18  
1997-05-28                       0                           2            11

现在我正在寻找的情节：

熊猫多索引：如何将多索引（YYYY，MM）合并为一个（YYYY-MM）索引？

1 个答案: