我正在尝试向pandas.DataFrame
添加一列。如果DataFrame
中的字符串在dict中有一个或多个单词作为键。但它给了我一个错误,我不知道出了什么问题。有人可以帮忙吗?
data_frame:
tw_test.head()
tweet
0 living the dream. #cameraman #camera #camerac...
1 justin #trudeau's reasons for thanksgiving. to...
2 @themadape butt…..butt…..we’re allergic to l...
3 2 massive explosions at peace march in #turkey...
4 #mulcair suggests there’s bad blood between hi...
字典:
party={}
{'#mulcair': 'NDP', '#cdnleft': 'liberal', '#LiberalExpress': 'liberal', '#ThankYouStephenHarper': 'Conservative ', '#pmjt': 'liberal'...}
我的代码:
tw_test["party"]=tw_test["tweet"].apply(lambda x: party[x.split(' ')[1].startswith("#")[0]])
答案 0 :(得分:0)
我相信你的麻烦是因为试图过多地填入lambda
。执行查找的功能非常简单:
<强>代码:强>
party_tags = {
'#mulcair': 'NDP',
'#cdnleft': 'liberal',
'#LiberalExpress': 'liberal',
'#ThankYouStephenHarper': 'Conservative ',
'#pmjt': 'liberal'
}
def party(tweet):
for tag in [t for t in tweet.split() if t.startswith('#')]:
if tag in party_tags:
return party_tags[tag]
测试代码:
import pandas as pd
tw_test = pd.DataFrame([x.strip() for x in u"""
living the dream. #cameraman #camera #camerac
justin #trudeau's reasons for thanksgiving. to
@themadape butt…..butt…..we’re allergic to
2 massive explosions at peace march in #turkey
#mulcair suggests there’s bad blood between
""".split('\n')[1:-1]], columns=['tweet'])
tw_test["party"] = tw_test["tweet"].apply(party)
print(tw_test)
<强>结果:强>
tweet party
0 living the dream. #cameraman #camera #camerac None
1 justin #trudeau's reasons for thanksgiving. to None
2 @themadape butt…..butt…..we’re allergic to None
3 2 massive explosions at peace march in #turkey None
4 #mulcair suggests there’s bad blood between NDP