我正在尝试制作一个Series
,让我能够过滤DataFrame
。
看看这个例子:
from pandas import DataFrame, Series
if __name__ == '__main__':
groups = ['G1'] * 4 + ['G2'] * 4 + ['G3'] * 4
d1 = ['2019-04-15', '2019-04-16', '2019-04-17', '2019-04-18']
d2 = ['2019-04-15', '2019-04-17', '2019-04-18', '2019-04-19']
d3 = ['2019-04-15', '2019-04-16', '2019-04-19', '2019-04-21']
dates = d1 + d2 + d3
data = {'group': groups, 'date': dates}
frame = DataFrame(data=data)
frame = frame.astype(dtype={'group': 'object', 'date': 'datetime64[ns]'})
print(frame)
r1 = [False, True, True, True]
r2 = [False, False, True, True]
r3 = [False, True, False, False]
result = r1 + r2 + r3
frame = frame.join(other=pd.Series(data=result).rename(index='result'))
print(frame)
第一帧和第二帧图像如下:
group date
0 G1 2019-04-15 # <-- First date of G1 | False
1 G1 2019-04-16 # <-- Previous date + 1 | True
2 G1 2019-04-17 # <-- Previous date + 1 | True
3 G1 2019-04-18 # <-- Previous date + 1 | True
4 G2 2019-04-15 # <-- First date of G2 | False
5 G2 2019-04-17 # <-- Previous date + 2 | False
6 G2 2019-04-18 # <-- Previous date + 1 | True
7 G2 2019-04-19 # <-- Previous date + 1 | True
8 G3 2019-04-15 # <-- First date of G3 | False
9 G3 2019-04-16 # <-- Previous date + 1 | True
10 G3 2019-04-19 # <-- Previous date + 3 | False
11 G3 2019-04-21 # <-- Previous date + 2 | False
现在,我可以使用frame.loc[result, :]
过滤数据框,我只需要了解如何比较日期...
还有一点,我想控制偏移量,例如:(3,5)天之间的差为True
,否则为False
。
谢谢大家:)
答案 0 :(得分:1)
这是您想要的吗?:
import pandas as pd
if __name__ == '__main__':
groups = ['G1'] * 4 + ['G2'] * 4 + ['G3'] * 4
d1 = ['2019-04-15', '2019-04-16', '2019-04-17', '2019-04-18']
d2 = ['2019-04-15', '2019-04-17', '2019-04-18', '2019-04-19']
d3 = ['2019-04-15', '2019-04-16', '2019-04-19', '2019-04-21']
dates = d1 + d2 + d3
data = {'group': groups, 'date': dates}
frame = pd.DataFrame(data=data)
frame = frame.astype(dtype={'group': 'object', 'date': 'datetime64[ns]'})
print(frame)
r1 = [False, True, True, True]
r2 = [False, False, True, True]
r3 = [False, True, False, False]
result = r1 + r2 + r3
frame = frame.join(other=pd.Series(data=result).rename(index='result'))
# fill with default data
frame["day_diff"] = pd.to_timedelta(arg=0, unit="days")
# calculate diffs
frame["day_diff"].loc[frame["group"] == frame["group"].shift(1)] = frame["date"] - frame["date"].shift(1)
# calculate high and low day values
low = pd.to_timedelta(arg=3, unit="days")
high = pd.to_timedelta(arg=5, unit="days")
# check values
frame["good"] = (low <= frame["day_diff"]) & (frame["day_diff"] <= high)
print(frame)