按行之间的时间增量过滤数据帧

时间:2020-10-01 15:38:05

标签: python dataframe

我正在尝试制作一个Series,让我能够过滤DataFrame

看看这个例子:

from pandas import DataFrame, Series

if __name__ == '__main__':
    groups = ['G1'] * 4 + ['G2'] * 4 + ['G3'] * 4
    d1 = ['2019-04-15', '2019-04-16', '2019-04-17', '2019-04-18']
    d2 = ['2019-04-15', '2019-04-17', '2019-04-18', '2019-04-19']
    d3 = ['2019-04-15', '2019-04-16', '2019-04-19', '2019-04-21']
    dates = d1 + d2 + d3
    data = {'group': groups, 'date': dates}
    frame = DataFrame(data=data)
    frame = frame.astype(dtype={'group': 'object', 'date': 'datetime64[ns]'})
    print(frame)
    r1 = [False, True, True, True]
    r2 = [False, False, True, True]
    r3 = [False, True, False, False]
    result = r1 + r2 + r3
    frame = frame.join(other=pd.Series(data=result).rename(index='result'))
    print(frame)

第一帧和第二帧图像如下:

   group       date
0     G1 2019-04-15 # <-- First date of G1  | False
1     G1 2019-04-16 # <-- Previous date + 1 | True
2     G1 2019-04-17 # <-- Previous date + 1 | True
3     G1 2019-04-18 # <-- Previous date + 1 | True
4     G2 2019-04-15 # <-- First date of G2  | False
5     G2 2019-04-17 # <-- Previous date + 2 | False
6     G2 2019-04-18 # <-- Previous date + 1 | True
7     G2 2019-04-19 # <-- Previous date + 1 | True
8     G3 2019-04-15 # <-- First date of G3  | False
9     G3 2019-04-16 # <-- Previous date + 1 | True
10    G3 2019-04-19 # <-- Previous date + 3 | False
11    G3 2019-04-21 # <-- Previous date + 2 | False

现在,我可以使用frame.loc[result, :]过滤数据框,我只需要了解如何比较日期...

还有一点,我想控制偏移量,例如:(3,5)天之间的差为True,否则为False

谢谢大家:)

1 个答案:

答案 0 :(得分:1)

这是您想要的吗?:

import pandas as pd

if __name__ == '__main__':
    groups = ['G1'] * 4 + ['G2'] * 4 + ['G3'] * 4
    d1 = ['2019-04-15', '2019-04-16', '2019-04-17', '2019-04-18']
    d2 = ['2019-04-15', '2019-04-17', '2019-04-18', '2019-04-19']
    d3 = ['2019-04-15', '2019-04-16', '2019-04-19', '2019-04-21']
    dates = d1 + d2 + d3
    data = {'group': groups, 'date': dates}
    frame = pd.DataFrame(data=data)
    frame = frame.astype(dtype={'group': 'object', 'date': 'datetime64[ns]'})
    print(frame)
    r1 = [False, True, True, True]
    r2 = [False, False, True, True]
    r3 = [False, True, False, False]
    result = r1 + r2 + r3
    frame = frame.join(other=pd.Series(data=result).rename(index='result'))

    # fill with default data
    frame["day_diff"] = pd.to_timedelta(arg=0, unit="days")
    # calculate diffs
    frame["day_diff"].loc[frame["group"] == frame["group"].shift(1)] = frame["date"] - frame["date"].shift(1)

    # calculate high and low day values
    low = pd.to_timedelta(arg=3, unit="days")
    high = pd.to_timedelta(arg=5, unit="days")

    # check values
    frame["good"] = (low <= frame["day_diff"]) & (frame["day_diff"] <= high)
    print(frame)
相关问题