使用条件值将列添加到txt

时间:2017-05-24 12:40:28

标签: python python-3.x pandas twitter

我有一个txt_file,其中的行如下: " 2017-03-21 12:00:00"," 844334879861069999"," RT @__________:Ein wenig Zelda in der Schule spielen :) #SwitchMoment @NintendoDE URL&# 34;

我想在左侧添加一列,其中值为4表示正数,0表示负数,具体取决于该行是否包含正数(" :)",":D&# 34;)或负面笑容(": - (",":(")。如果两种类型都在一条线上,它需要变成99.我会很高兴的听取有关如何实现这些结果的任何建议。 我的尝试:

    import pandas as pd
    p_smilies=[":)",":D"]
    n_smilies=[":-(",":("]
    csv_input = pd.read_csv('input.csv')
    csv_input['sentiment']=0
    for line in csv_input["tweets"]:
            for p in p_smilies:
                    if p in line:
                            <ascribe value 4 to corresponding line in csv_input['sentiment]>
            for n in p_smilies:
                    if n in line:
                            <ascribe value 4 to corresponding line in csv_input['sentiment]>
<check whether both are in the same line and ascribe 99 to line>


    csv_input.to_csv('output.csv', index=False)

1 个答案:

答案 0 :(得分:1)

您可以numpy.where使用str.contains

csv_input = pd.DataFrame({'tweets': ['RT @_______len :) #SwitchMoment ', ':D :-( @NintendoDE URL', ':(', 'Ein wenig Zelda']})
print (csv_input)
                             tweets
0  RT @_______len :) #SwitchMoment 
1            :D :-( @NintendoDE URL
2                                :(
3                   Ein wenig Zelda

我添加了没有微笑的新值3

p_smilies=[r":\)",r":D"]
n_smilies=[r":-\(",r":\("]

mp = csv_input["tweets"].str.contains('|'.join(p_smilies))
mn = csv_input["tweets"].str.contains('|'.join(n_smilies))

csv_input['sentiment'] = np.where(mn & mp, 99, 
                         np.where(mn, 0, 
                         np.where(mp, 4, 3)))
print (csv_input)
                             tweets  sentiment
0  RT @_______len :) #SwitchMoment           4
1            :D :-( @NintendoDE URL         99
2                                :(          0
3                   Ein wenig Zelda          3

或者,对于否定和不微笑,如果值0相同:

csv_input['sentiment'] = np.where(mn & mp, 99, 
                         np.where(mp, 4, 0))
print (csv_input)
                             tweets  sentiment
0  RT @_______len :) #SwitchMoment           4
1            :D :-( @NintendoDE URL         99
2                                :(          0
3                   Ein wenig Zelda          0