我有一个pandas数据框,其中有几个变量包含代表用户采用的路径的列表。
我要根据行内列表的拆分方式将行拆分为多行。为了澄清,我有一些变量:
import pandas as pd
d = {'id': [1, 2, 3, 4], 'indicator': [0, 1, 0, 1], 'action':[['a','b','c','d','e','f'], ['a','b','c'],['a','b'], ['a','b','c']], 'dayOfAction': [[94,55,40,9,3,0],[12,4,0],[3,0],[45,10,0]]}
df = pd.DataFrame(data=d)
print(df)
id ind action dayOfAction
0 1 0 [a,b,c,d,e,f] [94,55,40,9,3,0]
1 2 1 [a,b,c] [12,4,0]
2 3 0 [a,b] [3,0]
3 4 1 [a,b,c] [45,10,0]
即对于ID 1,第一个动作是a型,发生在最后一个动作发生前94天。
我现在想根据两次操作之间是否存在至少30天的间隔来拆分行,即应该是这样:
print(df)
id ind action dayOfAction
0 1 0 [a] [94]
1 1 0 [b,c] [55,40]
2 1 0 [d,e,f] [9,3,0]
3 2 1 [a,b,c] [12,4,0]
4 3 0 [a,b] [3,0]
5 4 0 [a] [45]
6 4 1 [b,c] [10,0]
到目前为止,我有一个函数,该函数创建一个新列,该列包含索引的列表,此后应该进行拆分,我认为可以用来进行拆分,但是我不知道从这里开始。任何帮助将不胜感激!
def splitPaths(daysAgo):
split = []
for i in range(len(daysAgo) - 2, -1, -1):
if (daysAgo[i] - daysAgo[i+1] > 30):
split.append(i)
if(split == []):
split = None
return split
df['split'] = df.loc[:,'dayOfAction'].apply(lambda x: splitPaths(x))
print(df)
id ind action dayOfAction split
0 1 0 [a,b,c,d,e,f] [94,55,40,9,3,0] [2,0]
1 2 1 [a,b,c] [12,4,0] None
2 3 0 [a,b] [3,0] None
3 4 1 [a,b,c] [45,10,0] [0]