Question

代码

import pandas as pd
import numpy as np
dates = pd.date_range('20140301',periods=6)
id_col = np.array([[0, 1, 2, 0, 1, 2]])
data_col = np.random.randn(6,4)
data = np.concatenate((id_col.T, data_col), axis=1)
df = pd.DataFrame(data, index=dates, columns=list('IABCD'))
print df

print "before groupby:"
for index in df.index:
    if not index.freq:
        print "key:%f, no freq:%s" % (key, index)

print "after groupby:"
gb = df.groupby('I')
for key, group in gb:
    #group = group.resample('1D', how='first')
    for index in group.index:
        if not index.freq:
            print "key:%f, no freq:%s" % (key, index)

输出：

            I         A         B         C         D
2014-03-01  0  0.129348  1.466361 -0.372673  0.045254
2014-03-02  1  0.395884  1.001859 -0.892950  0.480944
2014-03-03  2 -0.226405  0.663029  0.355675 -0.274865
2014-03-04  0  0.634661  0.535560  1.027162  1.637099
2014-03-05  1 -0.453149 -0.479408 -1.329372 -0.574017
2014-03-06  2  0.603972  0.754232  0.692185 -1.267217

[6 rows x 5 columns]
before groupby:
after groupby:
key:0.000000, no freq:2014-03-01 00:00:00
key:0.000000, no freq:2014-03-04 00:00:00
key:1.000000, no freq:2014-03-02 00:00:00
key:1.000000, no freq:2014-03-05 00:00:00
key:2.000000, no freq:2014-03-03 00:00:00
key:2.000000, no freq:2014-03-06 00:00:00

但在我取消陈述之后：

#group = group.resample('1D', how='first')

似乎没问题。问题是，当我在一个大型数据集上运行时对时间戳进行一些操作时，总会出现一个错误“无法为没有偏移的时间戳添加积分值”。这是一个错误，还是我想念一些东西？

Answer 1

您将groupby对象视为DataFrame。

喜欢一个数据框，但需要apply来生成一个新结构（简化或实际的DataFrame）。

成语是：

df.groupby(....).apply(some_function)

做类似的事情：df.groupby(...).sum()是使用apply的语法糖。可以使用自然适用于使用这种糖的功能;否则他们会提出错误。

特别是您正在访问可以的group.index，但不能保证DatetimeIndex（时间分组时）。在需要时（通过freq）推断出datetimeindex的inferred_freq属性。

您的代码非常混乱，您是grouping，然后是resampling; resample为你做这件事，所以你根本不需要前一步。

resample事实上相当于groupby-apply（但对时域有特殊处理）。

groupby操作后缺少DatetimeIndex的频率

1 个答案: