Question

我有一个带有两级MultiIndex的DataFrame。第一级date是DatetimeIndex，第二级name只是一些字符串。数据间隔为10分钟。

如何在此MultiIndex的第一级按日期分组并计算每天的行数？

我怀疑连接到MultiIndex的DatetimeIndex给我带来了问题，因为做了

data.groupby(pd.TimeGrouper(freq='D')).count()

给了我

TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'MultiIndex'

我也尝试过写作

data.groupby(data.index.levels[0].date).count()

导致

ValueError: Grouper and axis must be same length

例如，我怎样才能使石斑鱼更长（即包含重复的指数值，现在省略它比轴短）？

谢谢！

Answer 1

假设Dataframe看起来像这样

d=pd.DataFrame([['Mon','foo',3],['Tue','bar',6],['Wed','qux',9]],
               columns=['date','name','amount'])\
              .set_index(['date','name'])

您可以仅为此分组操作从索引中删除名称

d.reset_index('name', drop=True)\
 .groupby('date')\
 ['amount'].count()

Answer 2

您可以使用level中的Grouper关键字。（另请注意，TimeGrouper已弃用）。这个参数是

目标指数的水平。

示例DataFrame：

dates = pd.date_range('2017-01', freq='10MIN', periods=1000)
strs = ['aa'] * 1000
df = pd.DataFrame(np.random.rand(1000,2), index=pd.MultiIndex.from_arrays((dates, strs)))

解决方案：

print(df.groupby(pd.Grouper(freq='D', level=0)).count())
              0    1
2017-01-01  144  144
2017-01-02  144  144
2017-01-03  144  144
2017-01-04  144  144
2017-01-05  144  144
2017-01-06  144  144
2017-01-07  136  136

更新：您在评论中注意到您的结果计数包含您想要删除的零。例如，假设您的DataFrame实际上在某些日子里丢失了：

df = df.drop(df.index[140:400])
print(df.groupby(pd.Grouper(freq='D', level=0)).count())
              0    1
2017-01-01  140  140
2017-01-02    0    0
2017-01-03   32   32
2017-01-04  144  144
2017-01-05  144  144
2017-01-06  144  144
2017-01-07  136  136

据我所知，无法在.count内排除零计数。相反，您可以使用上面的结果来删除零。

第一个解决方案（可能不太优选，因为它会在引入int时转换float结果为np.nan，

res = df.groupby(pd.Grouper(freq='D', level=0)).count()
res = res.replace(0, np.nan).dropna()

我认为，第二个更好的解决方案来自here：

res = res[(res.T != 0).any()]
print(res) # notice - excludes 2017-01-02
              0    1
2017-01-01  140  140
2017-01-03   32   32
2017-01-04  144  144
2017-01-05  144  144
2017-01-06  144  144
2017-01-07  136  136

.any来自NumPy，移植到pandas，当任何元素在请求的轴上为True时返回True。

如何在MultiIndex的DataFrame中计算每天的行数？

2 个答案: