问题数据
df = pd.DataFrame({'Keyword': ['basement finishing systems akron pa', 'basement finishing systems biglerville pa', 'basement finishing systems chambersburg pa', 'basement finishing systems christiana pa', 'basement finishing systems delta pa'], 'StemmedKW': [['basement', 'finish', 'system', 'akron', 'pa'], ['basement', 'finish', 'system', 'biglervil', 'pa'], ['basement', 'finish', 'system', 'chambersburg', 'pa'], ['basement', 'finish', 'system', 'christiana', 'pa'], ['basement', 'finish', 'system', 'delta', 'pa']], 'Ad Group': ['Finishing System', 'Finishing System', 'Finishing System', 'Finishing System', 'Finishing System'], 'Campaign': ['Campaign A', 'Campaign A', 'Campaign A', 'Campaign A', 'Campaign A'], 'StemmedAG': [['finish', 'system'], ['finish', 'system'], ['finish', 'system'], ['finish', 'system'], ['finish', 'system']]}, columns=['Campaign', 'Ad Group', 'Keyword', 'StemmedAG', 'StemmedKW'])
数据框看起来像这样
Campaign Ad Group Keyword \
0 Campaign A Finishing System basement finishing systems akron pa
1 Campaign A Finishing System basement finishing systems biglerville pa
2 Campaign A Finishing System basement finishing systems chambersburg pa
3 Campaign A Finishing System basement finishing systems christiana pa
4 Campaign A Finishing System basement finishing systems delta pa
StemmedAG StemmedKW
0 [finish, system] [basement, finish, system, akron, pa]
1 [finish, system] [basement, finish, system, biglervil, pa]
2 [finish, system] [basement, finish, system, chambersburg, pa]
3 [finish, system] [basement, finish, system, christiana, pa]
4 [finish, system] [basement, finish, system, delta, pa]
上下文
StemmedAG
和StemmedKW
是列表列。我通过词汇Ad Group
和Keyword
列生成了这些列。目标是在+
列中的关键字前面添加一个加号Keyword
,用于StemmedAG
和StemmedKW
中显示的任何字词。
结果
请注意row 0
Keyword
的值basement +finishing +systems akron pa
是多少?这是因为单词finish
和system
都出现在StemmedAG
和StemmedKW
中。因此,加号会放在Keyword
列中的非词干词之前。
Campaign Ad Group Keyword \
0 Campaign A Finishing System basement +finishing +systems akron pa
1 Campaign A Finishing System basement +finishing +systems biglerville pa
2 Campaign A Finishing System basement +finishing +systems chambersburg pa
3 Campaign A Finishing System basement +finishing +systems christiana pa
4 Campaign A Finishing System basement +finishing +systems delta pa
StemmedAG StemmedKW
0 ['finish', 'system'] ['basement', 'finish', 'system', 'akron', 'pa']
1 ['finish', 'system'] ['basement', 'finish', 'system', 'biglervil', ...
2 ['finish', 'system'] ['basement', 'finish', 'system', 'chambersburg...
3 ['finish', 'system'] ['basement', 'finish', 'system', 'christiana',...
4 ['finish', 'system'] ['basement', 'finish', 'system', 'delta', 'pa']
我不习惯在Pandas列中使用lists
,并且不知道如何从lists
中的两列中获取dataframe
的交集,然后获取单词出现位置的索引,然后在每个找到的索引的前面应用加号。或者更简单的是使用df['Keyword']
中的单词StemmedAG
替换字符串?
我也想尽可能地做大熊猫,避免for
循环。
答案 0 :(得分:0)
我想出了如何用非熊猫方法实现这一目标,但它非常讨厌。我真的希望学习如何用熊猫做这件事(如果它甚至可能!)
for idx in df.index:
intersect = list(set(df['StemmedAG'][idx]).intersection(df['StemmedKW'][idx]))
positions = [i for word in intersect for i, j in enumerate(df['StemmedKW'][idx]) if j == word]
df.loc[idx, 'Keyword'] = ' '.join(["+"+word if df['Keyword'][idx].split().index(word) in positions else word for word in df['Keyword'][idx].split()])