我想要
>> x = np.arange(0,4*np.pi,0.1)
>> y = np.sin(x)
>> df = pd.DataFrame(index=x, data=y)
>> map(lambda (k, v): (k, list(v)), groupby(np.sin(x), lambda x: x>=0))
[(True,
[0.0, ..., 0.041580662433290491]),
(False,
[-0.058374143427580086, ..., -0.083089402817496397]),
(True,
[0.016813900484350601, ..., 0.024775425453357765]),
(False,
[-0.075151120461809301, ..., -0.066321897351200684])
]
但是在更多" pandas-way"使用df
。
pd.groupby(.)
(据我所知)只给了我2组而不是4组。
答案 0 :(得分:1)
您可以使用diff
,然后使用cumsum
为每组Trues和Falses提供自己的号码:
import numpy as np
import pandas as pd
x = np.arange(0,4*np.pi,0.1)
y = np.sin(x)
df = pd.DataFrame(y, index=x)
mask = np.sin(df[0])>=0
groupnum = mask.diff().fillna(method='bfill').fillna(0).cumsum()
print([(key, grp.head()) for key, grp in df.groupby(groupnum)])
产量
[(0, 0
0.0 0.000000
0.1 0.099833
0.2 0.198669
0.3 0.295520
0.4 0.389418), (1, 0
3.2 -0.058374
3.3 -0.157746
3.4 -0.255541
3.5 -0.350783
3.6 -0.442520), (2, 0
6.3 0.016814
6.4 0.116549
6.5 0.215120
6.6 0.311541
6.7 0.404850), (3, 0
9.5 -0.075151
9.6 -0.174327
9.7 -0.271761
9.8 -0.366479
9.9 -0.457536)]