我有一个简单的数据框df
,其中包含一列列表lists
。我想根据lists
生成3个额外的列。
df
看起来像:
import pandas as pd
lists={1:[[1]],2:[[1,2,3]],3:[[2,9,7,9]],4:[[2,7,3,5]]}
#create test dataframe
df=pd.DataFrame.from_dict(lists,orient='index')
df=df.rename(columns={0:'lists'})
df
lists
1 [1]
2 [1, 2, 3]
3 [2, 9, 7, 9]
4 [2, 7, 3, 5]
我希望df
看起来像这样:
lists cumset adds drops
1 [1] {1} {1} {}
2 [1,2,3] {1,2,3} {2,3} {}
3 [2,9,7,9] {1,2,3,7,9} {7,9} {3}
4 [2,7,3,5] {1,2,3,5,7,9} {3,5} {9}
基本上我需要弄清楚如何创建cumset
(某种类型的应用?,(是否已经有pandas函数?)。然后对于添加和删除,基本上我们想要比较df.lists到df.lists.shift(),并确定哪些项目是新的以及缺少哪些项目。可能是这样的:
df['adds']=df[['lists',df.lists.shift()]].apply(lambda x: {i for i in x.lists if i not in x.lists.shift()}, axis=1)
玩得开心,谢谢。
答案 0 :(得分:1)
您可以使用pandas.DataFrame.cumsum创建累积列,并使用集而不是列表创建列,并使用pandas.DataFrame.shift制作“添加”和“删除”列:
import pandas as pd
import numpy as np
df['cumset'] = df['lists'].cumsum().apply(lambda x: np.unique(x))
df['sets'] = df['lists'].apply(lambda x: set(x))
shifted = df['sets'].shift(1).apply(lambda x: x if not pd.isnull(x) else set())
df['add'] = df['sets'] - shifted
df['drop'] = shifted - df['sets']
df = df.drop('sets', axis=1)
print(df)
#-->Output:
lists cumset add drop
1 [1] [1] {1} {}
2 [1, 2, 3] [1, 2, 3] {2, 3} {}
3 [2, 9, 7, 9] [1, 2, 3, 7, 9] {9, 7} {1, 3}
4 [2, 7, 3, 5] [1, 2, 3, 5, 7, 9] {3, 5} {9}
答案 1 :(得分:1)
我认为您可以使用Series.cumsum
+ Series.shift
+ Series.iat
,主要Series.apply
用于set
s:
df['cumset'] = df['lists'].cumsum().apply(set)
lists_sets = df['lists'].apply(set)
lists_shifted = lists_sets.shift()
#replace first value - NaN to set
lists_shifted.iat[0] = set()
lists_shifted = lists_shifted.apply(set)
df['add'] = lists_sets - lists_shifted
df['drop'] = lists_shifted - lists_sets
print (df)
lists cumset add drop
1 [1] {1} {1} {}
2 [1, 2, 3] {1, 2, 3} {2, 3} {}
3 [2, 9, 7, 9] {1, 2, 3, 9, 7} {9, 7} {1, 3}
4 [2, 7, 3, 5] {1, 2, 3, 5, 7, 9} {3, 5} {9}