因为我想在移动时间窗口(60秒)内计算A列的唯一编号:
fn = lambda x: len(np.unique(x))
df = pd.DataFrame({'A':['a', 'b', 'a', 'b', 'e'], 'B': [0, 1, 2, 3, 4]},
index = [pd.Timestamp('20130101 09:01:00'),
pd.Timestamp('20130101 09:01:32'),
pd.Timestamp('20130101 09:02:03'),
pd.Timestamp('20130101 09:02:25'),
pd.Timestamp('20130101 09:03:06')])
df[['A']].rolling('60s').apply(fn)
我希望结果为
2013-01-01 09:01:00 1
2013-01-01 09:01:32 2
2013-01-01 09:02:03 2
2013-01-01 09:02:25 2
2013-01-01 09:03:06 2
然而,结果是:
2013-01-01 09:01:00 a
2013-01-01 09:01:32 b
2013-01-01 09:02:03 a
2013-01-01 09:02:25 b
2013-01-01 09:03:06 e
问题是什么?
答案 0 :(得分:1)
您可以使用B
列代替A
:
a = df[['B']].rolling('60s').apply(fn)
print (a)
B
2013-01-01 09:01:00 1.0
2013-01-01 09:01:32 2.0
2013-01-01 09:02:03 2.0
2013-01-01 09:02:25 3.0
2013-01-01 09:03:06 2.0
如果需要转换为int
:
a = df[['B']].rolling('60s').apply(fn).astype(int)
print (a)
B
2013-01-01 09:01:00 1
2013-01-01 09:01:32 2
2013-01-01 09:02:03 2
2013-01-01 09:02:25 3
2013-01-01 09:03:06 2
如果没有列,您可以创建它:
a = df.assign(B=np.arange(len(df.index)))[['B']].rolling('60s').apply(fn).astype(int)
print (a)
B
2013-01-01 09:01:00 1
2013-01-01 09:01:32 2
2013-01-01 09:02:03 2
2013-01-01 09:02:25 3
2013-01-01 09:03:06 2
df['B'] = np.arange(len(df.index))
a = df[['B']].rolling('60s').apply(fn).astype(int)
print (a)
B
2013-01-01 09:01:00 1
2013-01-01 09:01:32 2
2013-01-01 09:02:03 2
2013-01-01 09:02:25 3
2013-01-01 09:03:06 2
EDIT1:
df['B'] = np.arange(len(df.index))
a = df.groupby('A')[['B']].rolling('60s').apply(fn).astype(int)
print (a)
B
A
a 2013-01-01 09:01:00 1
2013-01-01 09:02:03 1
b 2013-01-01 09:01:32 1
2013-01-01 09:02:25 2
e 2013-01-01 09:03:06 1
答案 1 :(得分:-1)
只需尝试这种方式:
In [40]: import pandas as pd
In [41]: fn = lambda x: len(np.unique(x))
...: df = pd.DataFrame({'A':['a', 'b', 'c', 'd', 'e'], 'B': [0, 1, 2, 3, 4]},
...: index = [pd.Timestamp('20130101 09:01:00'),
...: pd.Timestamp('20130101 09:01:32'),
...: pd.Timestamp('20130101 09:02:03'),
...: pd.Timestamp('20130101 09:02:25'),
...: pd.Timestamp('20130101 09:03:06')])
In [42]: df[['B']] = df[['B']].rolling('60s').apply(fn).astype(int)
In [43]: df[['']] = df[['B']]
In [44]: df[['']]
Out[44]:
2013-01-01 09:01:00 1
2013-01-01 09:01:32 2
2013-01-01 09:02:03 2
2013-01-01 09:02:25 3
2013-01-01 09:03:06 2
In [45]: