我有以下数据框
df = pd.DataFrame({'col1':range(20), 'col2': list(range(3)) + [5] *3 +list(range(3)) + [3]*3 + list(range(4)) + [2]*3 + [4] },
index = pd.date_range('1/1/2000', periods=20, freq='1S'))
df
Out[115]:
col1 col2
2000-01-01 00:00:00 0 0
2000-01-01 00:00:01 1 1
2000-01-01 00:00:02 2 2
2000-01-01 00:00:03 3 5 *
2000-01-01 00:00:04 4 5 *
2000-01-01 00:00:05 5 5 *
2000-01-01 00:00:06 6 0
2000-01-01 00:00:07 7 1
2000-01-01 00:00:08 8 2
2000-01-01 00:00:09 9 3 *
2000-01-01 00:00:10 10 3 *
2000-01-01 00:00:11 11 3 *
2000-01-01 00:00:12 12 0
2000-01-01 00:00:13 13 1
2000-01-01 00:00:14 14 2
2000-01-01 00:00:15 15 3
2000-01-01 00:00:16 16 2 *
2000-01-01 00:00:17 17 2 *
2000-01-01 00:00:18 18 2 *
2000-01-01 00:00:19 19 4
从上面可以看到,我在col2中有三个具有相同值的片段,我想将这三个片段提取出来:
col1 col2
2000-01-01 00:00:03 3 5
2000-01-01 00:00:04 4 5
2000-01-01 00:00:05 5 5
col1 col2
2000-01-01 00:00:09 9 3
2000-01-01 00:00:10 10 3
2000-01-01 00:00:11 11 3
col1 col2
2000-01-01 00:00:16 16 2
2000-01-01 00:00:17 17 2
2000-01-01 00:00:18 18 2
我该如何实现?
答案 0 :(得分:2)
这是使用diff
和cumsum
创建不同组的一种方法,然后我们使用transform
和count
来获取组计数,并选择等于3的计数,最后我们只需要groupby
并将数据帧除以col2
s=df.col2.diff().ne(0).cumsum()
l=[y for x , y in df[s.groupby(s).transform('count')==3].groupby('col2')]
l[0]
Out[205]:
col1 col2
2000-01-01 00:00:16 16 2
2000-01-01 00:00:17 17 2
2000-01-01 00:00:18 18 2
答案 1 :(得分:1)
这是我的看法:
df = pd.DataFrame({'col1':range(20), 'col2': list(range(3)) + [5] *3 +list(range(3)) + [3]*3 + list(range(4)) + [2]*3 + [4] },
index = pd.date_range('1/1/2000', periods=20, freq='1S'))
# create markers for equal segment
df['markers'] = ((df.col2==df.col2.shift(-1)) & (df.col2 == df.col2.shift(-2))).cumsum()
# drop the first lines:
new_df = df[df['markers'] > 0].copy()
# output:
new_df.groupby('markers')[['col1','col2']].apply(lambda x: x[:3])
输出:
+----------+----------------------+-------+------+
| | | col1 | col2 |
+----------+----------------------+-------+------+
| markers | | | |
+----------+----------------------+-------+------+
| 1 | 2000-01-01 00:00:03 | 3 | 5 |
| | 2000-01-01 00:00:04 | 4 | 5 |
| | 2000-01-01 00:00:05 | 5 | 5 |
| 2 | 2000-01-01 00:00:09 | 9 | 3 |
| | 2000-01-01 00:00:10 | 10 | 3 |
| | 2000-01-01 00:00:11 | 11 | 3 |
| 3 | 2000-01-01 00:00:16 | 16 | 2 |
| | 2000-01-01 00:00:17 | 17 | 2 |
| | 2000-01-01 00:00:18 | 18 | 2 |
+----------+----------------------+-------+------+