我试图删除在另一列中存在(匹配)的pandas数据框列中的一部分字符串,这些值用逗号分隔,并且可能是一个或多个。我想用字符串的其余部分创建一个新列。下面是可复制的示例和到目前为止的代码:
import pandas as pd
df = pd.DataFrame({
'Country' : ['Germany, France, Brazil, India, Russia','Russia, France,
Jamaica, India, China',
'Germany, Russia, Jamaica','Italy, Jamaica'],
'Exclude' : ['France, Brazil','India, Russia','Jamaica','Italy']})
print(df)
打印的数据框:
Country Exclude
0 Germany, France, Brazil, India, Russia France, Brazil
1 Russia, France, Jamaica, India, China India, Russia
2 Germany, Russia, Jamaica Jamaica
3 Italy, Jamaica Italy
我要创建“输出”列,该列将具有“排除”列中不存在的国家/地区的名称。所以我尝试了:
df['Output'] = df['Country'].replace(to_replace=r'\b'+df['Exclude']+r'\b',
value='',regex=True)
所需的输出:
Country Exclude Output
0 Germany, France, Brazil, India, Russia France, Brazil Germany, India, Russia
1 Russia, France, Jamaica, India, China India, Russia France, Jamaica, China
2 Germany, Russia, Jamaica Jamaica Germany, Russia
3 Italy, Jamaica Italy Jamaica
完成一半工作,就像当“国家/地区”中的“排除”列中的文本完全匹配时匹配,但是当序列与“排除”列中的序列不同时不起作用。例如,它将不适用于第二行。 在发布问题之前,我花了很多时间并尝试了其他几种方法,我在SO上发现了类似的问题,但在这种情况下它们无济于事。 请帮忙。
答案 0 :(得分:2)
在set difference
中每行使用apply
分割值:
f=lambda x: ', '.join(set(x['Country'].split(', ')).difference(set(x['Exclude'].split(', '))))
df['Out'] = df.apply(f, axis=1)
或使用zip
进行列表理解:
df['Out'] = ([', '.join(set(a.split(', ')).difference(set(b.split(', '))))
for a, b in zip(df['Country'], df['Exclude'])])
print (df)
Country Exclude \
0 Germany, France, Brazil, India, Russia France, Brazil
1 Russia, France, Jamaica, India, China India, Russia
2 Germany, Russia, Jamaica Jamaica
3 Italy, Jamaica Italy
Out
0 Germany, India, Russia
1 China, France, Jamaica
2 Germany, Russia
3 Jamaica
如果订单很重要:
df['Out'] = [', '.join(x for x in a.split(', ') if x not in set(b.split(', ')))
for a, b in zip(df['Country'], df['Exclude'])]
print (df)
Country Exclude \
0 Germany, France, Brazil, India, Russia France, Brazil
1 Russia, France, Jamaica, India, China India, Russia
2 Germany, Russia, Jamaica Jamaica
3 Italy, Jamaica Italy
Out
0 Germany, India, Russia
1 France, Jamaica, China
2 Germany, Russia
3 Jamaica