我有一个熊猫DataFrame,我想选择值以特定值开头和结尾的行。例如,在dataFrame df
中,我想选择列state
以1
开始和结束的行。那就是第2 5 8 10
行。并输出两个数据帧。
import pandas as pd
data = [['a1',0,'low'],
['a1',0,'low'],
['a1',1,'high'],
['a1',1,'low'],
['a1',1,'low'],
['a1',1,'high'],
['a1',0,'low'],
['a1',0,'low'],
['a2',1,'high'],
['a2',1,'low'],
['a2',1,'low'],
['a2',0,'low'],
['a2',0,'low']]
df = pd.DataFrame(data,columns=['id','state','type'])
df
出:
id state type
0 a1 0 low
1 a1 0 low
2 a1 1 high
3 a1 1 low
4 a1 1 low
5 a1 1 high
6 a1 0 low
7 a1 0 low
8 a2 1 high
9 a2 1 low
10 a2 1 low
11 a2 0 low
12 a2 0 low
最后,我想要两个数据框,如下所示:
df1
id state type code
2 a1 1 high start
8 a2 1 high start
df2
id state type code
5 a1 1 high end
10 a2 1 low end
答案 0 :(得分:2)
您可以使用布尔掩码选择所需的行:
m1 = df['state'].diff() == 1
m2 = df['state'].shift(-1).diff() == -1
res = df[m1 | m2]
print(res)
id state type
2 a1 1 high
5 a1 1 high
8 a2 1 high
10 a2 1 low
您可以使用列表推导将其分为2个数据框:
df1, df2 = [res.iloc[i::2] for i in range(int(len(res.index)/2))]
print(df1, df2, sep='\n\n')
id state type
2 a1 1 high
8 a2 1 high
id state type
5 a1 1 high
10 a2 1 low