Question

我有一个DataFrame，其中包含每行报纸文章的元数据。我想将这些分组为每月块，然后计算一列的值（称为type）：

monthly_articles = articles.groupby(pd.Grouper(freq="M"))
monthly_articles = monthly_articles["type"].value_counts().unstack()

这适用于年度小组，但在我尝试按月分组时失败：

ValueError: operands could not be broadcast together with shape (141,) (139,)

我认为这是因为有些月份组没有文章。如果我迭代组并在每个组上打印value_counts：

for name, group in monthly_articles:
    print(name, group["type"].value_counts())

我在2006年1月和2月的小组中获得空白系列：

2005-12-31 00:00:00 positive    1
Name: type, dtype: int64
2006-01-31 00:00:00 Series([], Name: type, dtype: int64)
2006-02-28 00:00:00 Series([], Name: type, dtype: int64)
2006-03-31 00:00:00 negative    6
positive    5
neutral     1
Name: type, dtype: int64
2006-04-30 00:00:00 negative    11
positive     6
neutral      3
Name: type, dtype: int64

使用value_counts()时如何忽略空组？

我试过dropna=False但没有成功。我认为这与this question相同。

Answer 1

您最好向我们提供数据样本。否则，有点难以指出问题。从您的代码段开始，似乎某些月份的type数据为空。您可以对分组对象使用apply函数，然后调用unstack函数。以下是适用于我的示例代码，数据是随机生成的

s = pd.Series(['positive', 'negtive', 'neutral'], index=[0, 1, 2])
atype = s.loc[np.random.randint(3, size=(150,))]

df = pd.DataFrame(dict(atype=atype.values), index=pd.date_range('2017-01-01',  periods=150))

gp = df.groupby(pd.Grouper(freq='M'))
dfx = gp.apply(lambda g: g['atype'].value_counts()).unstack()

In [75]: dfx
Out[75]: 
            negtive  neutral  positive
2017-01-31       13        9         9
2017-02-28       11       11         6
2017-03-31       12        6        13
2017-04-30        8       12        10
2017-05-31        9       10        11

如果有空值：

In [76]: df.loc['2017-02-01':'2017-04-01', 'atype'] = np.nan
    ...: gp = df.groupby(pd.Grouper(freq='M'))
    ...: dfx = gp.apply(lambda g: g['atype'].value_counts()).unstack()
    ...: 

In [77]: dfx
Out[77]: 
            negtive  neutral  positive
2017-01-31       13        9         9
2017-04-30        8       12         9
2017-05-31        9       10        11

感谢。

在Pandas groupby上使用value_counts时，如何忽略空系列？

1 个答案: