Question

我有一个熊猫数据框，如下所示：

Start       End
2017-12-21  2017-12-23
2018-01-01  2018-01-05
2018-01-04  2018-01-07
2018-03-05  2018-09-06

我想编写一个函数来检查Start的值是否在上一行的Start和End的值之间，并相应地将OverlapWithAboveRow设置为1或0。

Start       End         OverlapWithAboveRow
2017-12-21  2017-12-23  0
2018-01-01  2018-01-05  0
2018-01-04  2018-01-07  1
2018-03-05  2018-09-06  0

我该怎么做？是否可以编写一个函数在apply方法中使用，该函数引用行的值以及行的值？

我知道可以使用for循环，但是它相当慢，我认为可能有一种更快的方法。

for i in df.index:
    if df.loc[i-1,'Start'] <= df.loc[i,'Start'] <= df.loc[i-1,'End']:
        df.loc[i,'OverlapWithAboveRow'] = 1

Answer 1

无需使用循环，可以将pd.Series.between与shift一起使用以返回布尔系列并将类型指定为int，然后将其设置为新的列名。

df['OverlapWithAboveRow'] = df['Start'].between(df['Start'].shift(), df['End'].shift()).astype(int)

       Start        End     OverlapWithAboveRow
0   2017-12-21  2017-12-23       0
1   2018-01-01  2018-01-05       0
2   2018-01-04  2018-01-07       1
3   2018-03-05  2018-09-06       0

如果您确实想创建一个函数，则可以：

def myFunc(df, start, end):
    """
    df is the dataframe
    start is the name of the column for the start times
    end is the name of the column for the end times
    """
    return df[start].between(df[start].shift(), df[end].shift()).astype(int)

df['OverlapWithAboveRow'] = myFunc(df, 'Start', 'End')

写入功能适用于查看问题行和上方行的熊猫数据框

1 个答案: