数据框有两列:句子和列表。要求是将df ['list']中存在的df ['entent']中的字符串替换为found | present。
from pandas import DataFrame
df = {'list': [['Ford','Mercedes Benz'],['ford','hyundai','toyota'],['tesla'],[]],
'sentence': ['Ford is less expensive than Mercedes Benz' ,'toyota and hyundai mileage is good compared to ford','tesla is an electric car','toyota too has electric cars']
}
df = DataFrame(df,columns= ['list','sentence'])
df ['entent']的预期输出为:
Ford|present is less expensive than Mercedes Benz|present
toyota|present and hyundai|present mileage is good compared to ford|present
tesla|present is an electric car
toyota too has electric cars
答案 0 :(得分:1)
使用正则表达式替换:
(摘自 IPython 交互式会话)
In [36]: import re
In [37]: def sub_from_list(row):
...: if row['list']:
...: row['sentence'] = re.sub(r'({})'.format('|'.join(set(row['list']))), r'\1|present', row['s
...: entence'])
...: return row
...:
In [38]: df.apply(sub_from_list, axis=1)
Out[38]:
list sentence
0 [Ford, hyundai] Ford|present is expensive than hyundai|present
1 [ford, hyundai, toyota] toyota|present and hyundai|present mileage is ...
2 [tesla] tesla|present is an electric car
3 [] toyota too has electric cars
答案 1 :(得分:0)
您可以使用apply函数和正则表达式来替换apply函数中的文本
import re
df = {'list': [['Ford','Mercedes Benz'],['ford','hyundai','toyota'],['tesla'],[]],
'sentence': ['Fords is less expensive than Mercedes Benz' ,'toyota and hyundai mileage is good compared to ford','tesla is an electric car','toyota too has electric cars']
}
df = DataFrame(df,columns= ['list','sentence'])
def replace_values(row):
if len(row.list)>0:
pat = r"(\b"+"|".join(row.list) +r")(\b)"
print(pat)
row.sentence = re.sub(pat, "\\1|present\\2", row.sentence)
return row
df.apply(replace_values, axis=1)
答案 2 :(得分:0)
您可以在数据框上使用自定义函数,如下所示:
代码
import pandas as pd
df = {'list': [['Ford','hyundai'],['ford','hyundai','toyota'],['tesla'],[]],
'sentence': ['Ford is expensive than hyundai' ,'toyota and hyundai mileage is good compared to ford','tesla is an electric car','toyota too has electric cars']
}
df = pd.DataFrame(df)
def rep_text(row):
if not row.list:
return row
words = row.sentence.split()
new_words = [word+'|present' \
if word in row.list else word\
for word in words]
row['sentence'] = ' '.join(new_words)
return row
df = df.apply(rep_text, axis=1)
输出
list sentence
0 [Ford, hyundai] Ford|present is expensive than hyundai|present
1 [ford, hyundai, toyota] toyota|present and hyundai|present mileage is ...
2 [tesla] tesla|present is an electric car
3 [] toyota too has electric cars