这是我的多重索引:
pd.DataFrame({'category':['A','A','A','B','B','B'],
'row':[1,2,3,1,2,3],
'unique':[{0,1,2},{2,3,4},{1,5,6},{0,1,2},{3,4,5},{4,5,6}],
'new':[{0,1,2},{3,4},{5,6},{0,1,2},{3,4,5},{6}]}).set_index(['category','row'])
看起来像这样:
Category row unique new
A 1 {0,1,2} {0,1,2}
2 {2,3,4} {3,4}
3 {1,5,6} {5,6}
B 1 {0,1,2} {0,1,2}
2 {3,4,5} {3,4,5}
3 {4,5,6} {6}
我正在尝试应用类似
A.1 ['new'] intersect A.2['unique']
预期结果:
Category row unique new Previous Row Returned
A 1 {0,1,2} {0,1,2} None
2 {2,3,4} {3,4} {2}
3 {1,5,6} {5,6} {}
B 1 {0,1,2} {0,1,2} None
2 {3,4,5} {3,4,5} {}
3 {4,5,6} {6} {4,5}
我该如何处理?
答案 0 :(得分:0)
在熊猫中没有标量的工作应该很慢,但是如果需要的话:
#shift values per groups
df['Previous Row Returned'] = df.groupby(level=0)['new'].shift()
#boolean mask - working only for not missing values
mask = df['Previous Row Returned'].notnull()
#get intersection
f = lambda x: x['unique'].intersection(x['Previous Row Returned'])
df.loc[mask, 'Previous Row Returned'] = df.loc[mask].apply(f, axis=1)
print (df)
unique new Previous Row Returned
Category row
A 1 {0, 1, 2} {0, 1, 2} NaN
2 {2, 3, 4} {3, 4} {2}
3 {1, 5, 6} {5, 6} {}
B 1 {0, 1, 2} {0, 1, 2} NaN
2 {3, 4, 5} {3, 4, 5} {}
3 {4, 5, 6} {6} {4, 5}