Question

假设我有以下pandas数据帧：

In [1]: df
Out[1]:
  sentiment        date
0       pos  2016-10-08
1       neu  2016-10-08
2       pos  2016-10-09
3       neg  2016-10-09
4       neg  2016-10-09

我确实可以创建一个数据框，每天对情绪列进行汇总统计，如下所示：

gf=df.groupby(["date", "sentiment"]).size().reset_index(name='count')

给出了

In [2]: gf
Out[2]:
         date sentiment  count
0  2016-10-08       neu      1
1  2016-10-08       pos      1
2  2016-10-09       neg      2
3  2016-10-09       pos      1

但是我需要使用以下表格格式（或新数据帧）转换此结果，以便能够制作条形图（例如，如this Google条形图中所示）。

  date        pos neg neu  
0  2016-10-08 1    0   1      
1  2016-10-09 1    2   0

我尝试通过创建新数据框来实现它

columns = ['date','pos', 'neg', 'neu']

clean_sheet = pd.DataFrame(columns=columns)

然后迭代gf寻找唯一的日期，然后依次搜索那些用.loc搜索pos，neg或neu但是它变得非常混乱

有关简单解决方案的任何想法吗？

由于

Answer 1

您需要添加unstack：

gf = df.groupby(["date", "sentiment"]).size().unstack(fill_value=0).reset_index()
#remove column name 'sentiment'
gf.columns.name = None
print (gf)
         date  neg  neu  pos
0  2016-10-08    0    1    1
1  2016-10-09    2    0    1

使用pivot_table的另一个更慢的解决方案：

gf = df.pivot_table(index="date", columns="sentiment", aggfunc=len, fill_value=0)
       .reset_index()
gf.columns.name = None
print (gf)
         date  neg  neu  pos
0  2016-10-08    0    1    1
1  2016-10-09    2    0    1

最后一个解决方案是crosstab，但较大的DataFrame更慢：

gf = pd.crosstab( df.date, df.sentiment).reset_index()
gf.columns.name = None
print (gf)
         date  neg  neu  pos
0  2016-10-08    0    1    1
1  2016-10-09    2    0    1

计时（pandas 0.19.0）：

#[50000 rows x 2 columns]
df = pd.concat([df]*10000).reset_index(drop=True)

In [197]: %timeit (df.groupby(["date", "sentiment"]).size().unstack(fill_value=0).reset_index())
100 loops, best of 3: 6.3 ms per loop

In [198]: %timeit (df.pivot_table(index="date", columns="sentiment", aggfunc=len, fill_value=0).reset_index())
100 loops, best of 3: 12.2 ms per loop

In [199]: %timeit (pd.crosstab( df.date, df.sentiment).reset_index())
100 loops, best of 3: 11.3 ms per loop

以合适的格式对pandas数据帧进行分组以创建图表

1 个答案: