Question

我有一个pandas数据框，其中包含数千个日期和ID，如下所示：

2/1/18  123
2/1/18  123
2/1/18  456
2/1/18  789

我还有一个只有少数ID的简短列表，例如：

 ids=['123','456','909']

我需要遍历列表并按日期计算列表中的每个值。所以结果应该是这样的;

2/1/18 123 2
       456 1
       909 0

我可以在数据框上轻松完成一个组，但这将返回所有ID，我只需要id列表中的值。

Answer 1

这就是你要追求的吗？

import pandas as pd

df = pd.DataFrame([['2/1/18', 123],
                  ['2/1/18', 123],
                  ['2/1/18', 456],
                  ['2/1/18', 789]],
                  columns=['Date', 'ID'])

ids = ['123','456','909']

df['count'] = 1
results = df[df['ID'].isin(ids)].groupby(['Date', 'ID']).count()

Answer 2

我在这里使用reindex

df.loc[df.Id.isin(ids)].groupby('Date').Id.value_counts().reindex(index=pd.MultiIndex.from_product([df.Date.unique(),ids]),fill_value=0)
Out[1116]: 
2/1/18  123    2
        456    1
        909    0
Name: Id, dtype: int64

Python pandas，在数据框中按日期聚合值

2 个答案: