df1:
**users**
usr1
usr2
xyz2
xyz3
df2:
GroupUsers
0 usr1,usr2,usr3
1 abc1,abc2,abc3
2 def1,def2,def3
我正在尝试获取2个数据帧的差异。我需要一列到df2。
我尝试过: df2 ['other_users'] = df1 ['users']不在df2 ['GroupUsers']
答案 0 :(得分:0)
在split
和join
中使用lambda函数:
f = lambda x: ','.join(y for y in df1['users'] if y not in x.split(','))
df2['other_users'] = df2['GroupUsers'].apply(f)
print (df2)
GroupUsers other_users
0 usr1,usr2,usr3 xyz1,xyz2,xyz3
1 abc1,abc2,abc3 usr1,usr2,usr3,xyz1,xyz2,xyz3
2 def1,def2,def3 usr1,usr2,usr3,xyz1,xyz2,xyz3
答案 1 :(得分:0)
df2 ['GroupUsers']列表或字符串中的值是? 无论哪种方式,您都可以通过设置空白列,然后逐行遍历数据框来实现。
如果df2 ['GroupUsers']中的值是列表:
import pandas as pd
df1 = pd.DataFrame({'Users':['user1','user2','user3','xyz1','xyz2','xyz3']})
df2 = pd.DataFrame({'GroupUsers':[['user1','user4','user5'],['abc1','abc2','abc3'],['xyz1','sas2','sas3']]})
df2['other_users'] = ""
for row_number, row in df2.iterrows():
df2.at[row_number, 'other_users'] = [item for item in row['GroupUsers'] if item not in list(df1['Users'])]
如果df2 ['GroupUsers']中的值是字符串,则该过程相同,但会拆分列表:
import pandas as pd
df1 = pd.DataFrame({'Users':['user1','user2','user3','xyz1','xyz2','xyz3']})
df2 = pd.DataFrame({'GroupUsers':['user1,user4,user5','abc1,abc2,abc3','xyz1,sas2,sas3']})
df2['other_users'] = ""
for row_number, row in df2.iterrows():
df2.at[row_number, 'other_users'] = [item for item in row['GroupUsers'].split(',') if item not in list(df1['Users'])]
无论哪种方式,上面的输出是:
display(df2)
GroupUsers other_users
0 user1,user4,user5 [user4, user5]
1 abc1,abc2,abc3 [abc1, abc2, abc3]
2 xyz1,sas2,sas3 [sas2, sas3]
请记住使用dataframe.at []而不是dataframe.loc [],否则大熊猫会对将列表放在单个单元格中的概念感到困惑。