我有一些数据如跟随结构。它用在python pandas Data Frame中,我把它命名为df。
Data1,Data2,Flag
2016-04-29,00:40:15,1
2016-04-29,00:40:24,2
2016-04-29,00:40:35,2
2015-04-29,00:40:36,2
2015-04-29,00:40:43,2
2015-04-29,00:40:45,2
2015-04-29,00:40:55,1
2015-04-29,00:41:05,1
2015-04-29,00:41:16,1
2015-04-29,00:41:17,2
.....................
.....................
2016-11-29,11:52:36,2
2016-11-29,11:52:43,2
2016-11-29,11:52:45,2
2016-11-29,11:52:55,1
我希望数据符合以下要求。
2016-04-29,00:40:15
。我想让这个数据帧中的下一个数据大于引物数据18秒。
我会得到第二个数据:2016-04-29,00:40:35,2
第三个数据是:2015-04-29,00:40:55,1
对于上述两个要求,我将获得如下数据:
Data1,Data2,Flag
2016-04-29,00:40:15,1
2016-04-29,00:40:24,2
2015-04-29,00:40:43,2
2015-04-29,00:40:55,1
2015-04-29,00:41:16,1
2015-04-29,00:41:17,2
.....................
答案 0 :(得分:2)
我构建了一个生成器来生成行,然后使用pd.concat
def get_row(df):
ref = None
for i, row in df.iterrows():
if ref is not None:
cond1 = (row.Data2.total_seconds() -
ref.Data2.total_seconds() > 18)
cond2 = row.Flag != ref.Flag
if ref is None or cond1 or cond2:
yield row
ref = row
pd.concat([r for r in get_row(df)], axis=1).T
因为@Kartik坚持: - )
答案 1 :(得分:2)
在这里,试试这个:
df['Data2'] = pd.to_timedelta(df['Data2'])
tdf = df.copy()
sel_idx = []
while len(tdf) > 0:
sel_idx.extend([tdf.index[0]])
cond1 = tdf['Data2'] > tdf.loc[sel_idx[-1], 'Data2'] + pd.to_timedelta(18, 's')
cond2 = (tdf['Flag'] != tdf.loc[sel_idx[-1], 'Flag']) & (tdf['Data2'] > tdf.loc[sel_idx[-1], 'Data2'])
tdf = tdf[cond1 | cond2]
df.loc[sel_idx, :]
代码:
import pandas as pd
from io import StringIO
data = StringIO("""Data1,Data2,Flag
2016-04-29,00:40:15,1
2016-04-29,00:40:24,2
2016-04-29,00:40:35,2
2015-04-29,00:40:36,2
2015-04-29,00:40:43,2
2015-04-29,00:40:45,2
2015-04-29,00:40:55,1
2015-04-29,00:41:05,1
2015-04-29,00:41:16,1
2015-04-29,00:41:17,2
2016-11-29,11:52:36,2
2016-11-29,11:52:43,2
2016-11-29,11:52:45,2
2016-11-29,11:52:55,1""")
df = pd.read_csv(data)
df['Data2'] = pd.to_timedelta(df['Data2'])
print("Input\n", df)
tdf = df.copy()
sel_idx = []
while len(tdf) > 0:
sel_idx.extend([tdf.index[0]])
cond1 = tdf['Data2'] > tdf.loc[sel_idx[-1], 'Data2'] + pd.to_timedelta(18, 's')
cond2 = (tdf['Flag'] != tdf.loc[sel_idx[-1], 'Flag']) & (tdf['Data2'] > tdf.loc[sel_idx[-1], 'Data2'])
tdf = tdf[cond1 | cond2]
print("Ouput\n", df.loc[sel_idx, :])
输出:
Input
Data1 Data2 Flag
0 2016-04-29 00:40:15 1
1 2016-04-29 00:40:24 2
2 2016-04-29 00:40:35 2
3 2015-04-29 00:40:36 2
4 2015-04-29 00:40:43 2
5 2015-04-29 00:40:45 2
6 2015-04-29 00:40:55 1
7 2015-04-29 00:41:05 1
8 2015-04-29 00:41:16 1
9 2015-04-29 00:41:17 2
10 2016-11-29 11:52:36 2
11 2016-11-29 11:52:43 2
12 2016-11-29 11:52:45 2
13 2016-11-29 11:52:55 1
Output
Data1 Data2 Flag
0 2016-04-29 00:40:15 1
1 2016-04-29 00:40:24 2
4 2015-04-29 00:40:43 2
6 2015-04-29 00:40:55 1
8 2015-04-29 00:41:16 1
9 2015-04-29 00:41:17 2
10 2016-11-29 11:52:36 2
13 2016-11-29 11:52:55 1