在dict中满足条件时添加data_frame列

时间:2017-03-17 22:58:41

标签: python pandas dictionary

我正在尝试向pandas.DataFrame添加一列。如果DataFrame中的字符串在dict中有一个或多个单词作为键。但它给了我一个错误,我不知道出了什么问题。有人可以帮忙吗?

data_frame:

tw_test.head()

    tweet   
0   living the dream. #cameraman #camera #camerac...    
1   justin #trudeau's reasons for thanksgiving. to...   
2   @themadape butt…..butt…..we’re allergic to l... 
3   2 massive explosions at peace march in #turkey...   
4   #mulcair suggests there’s bad blood between hi...   

字典:

party={}
{'#mulcair': 'NDP', '#cdnleft': 'liberal', '#LiberalExpress': 'liberal', '#ThankYouStephenHarper': 'Conservative ', '#pmjt': 'liberal'...}

我的代码:

tw_test["party"]=tw_test["tweet"].apply(lambda x: party[x.split(' ')[1].startswith("#")[0]])

1 个答案:

答案 0 :(得分:0)

我相信你的麻烦是因为试图过多地填入lambda。执行查找的功能非常简单:

<强>代码:

party_tags = {
    '#mulcair': 'NDP',
    '#cdnleft': 'liberal',
    '#LiberalExpress': 'liberal',
    '#ThankYouStephenHarper': 'Conservative ',
    '#pmjt': 'liberal'
}

def party(tweet):
    for tag in [t for t in tweet.split() if t.startswith('#')]:
        if tag in party_tags:
            return party_tags[tag]

测试代码:

import pandas as pd
tw_test = pd.DataFrame([x.strip() for x in u"""
    living the dream. #cameraman #camera #camerac
    justin #trudeau's reasons for thanksgiving. to
    @themadape butt…..butt…..we’re allergic to
    2 massive explosions at peace march in #turkey
    #mulcair suggests there’s bad blood between
""".split('\n')[1:-1]], columns=['tweet'])

tw_test["party"] = tw_test["tweet"].apply(party)
print(tw_test)

<强>结果:

                                            tweet party
0  living the dream. #cameraman #camera #camerac  None
1  justin #trudeau's reasons for thanksgiving. to  None
2      @themadape butt…..butt…..we’re allergic to  None
3  2 massive explosions at peace march in #turkey  None
4     #mulcair suggests there’s bad blood between   NDP