我有一个数据框,其中包含带有推文的列。这些文本包含所谓的“ @”提及。我想在此数据框中添加一个新列,其中包含在该行中找到的特定“ @”提及。代码:
dfEx5.text.apply(str) #Convert all elements in the text-column to a string-type
dfEx5['mentions'] = pd.np.where(dfEx5.text.str.contains("@AmericanAir"), "@AmericanAir",
pd.np.where(dfEx5.text.str.contains("@JetBlue"), "@JetBlue",
pd.np.where(dfEx5.text.str.contains("@SouthwestAir"), "@SouthwestAir",
pd.np.where(dfEx5.text.str.contains("@united"), "@united",
pd.np.where(dfEx5.text.str.contains("@USAirways"), "@USAirways",
pd.np.where(dfEx5.text.str.contains("@VirginAmerica"), "@VirginAmerica",))))))
首先,我将所有元素都转换为字符串类型。如果该列中包含“ @AmericanAir”,则在提及列中添加“ @AmericanAir”,等等。
感谢您的帮助!
答案 0 :(得分:0)
pandas.Series.str.findall
我会在我的监视组中找到所有提及的内容,并进行第一个提及。
df.text.str.findall('|'.join(watch)).str[0]
0 @AmericanAir
1 @JetBlue
2 @SouthwestAir
3 @united
4 @USAirways
5 @VirginAmerica
Name: text, dtype: object
通过assign
df.assign(mentions=df.text.str.findall('|'.join(watch)).str[0])
text mentions
0 @AmericanAir @JetBlue @AmericanAir
1 @JetBlue @JetBlue
2 @SouthwestAir @SouthwestAir
3 @united @SouthwestAir @united
4 @USAirways @USAirways
5 @VirginAmerica @VirginAmerica
如果愿意,您可以保留所有提及内容
df.assign(mentions=df.text.str.findall('|'.join(watch)))
text mentions
0 @AmericanAir @JetBlue [@AmericanAir, @JetBlue]
1 @JetBlue [@JetBlue]
2 @SouthwestAir [@SouthwestAir]
3 @united @SouthwestAir [@united, @SouthwestAir]
4 @USAirways [@USAirways]
5 @VirginAmerica [@VirginAmerica]
watch = [
'@SouthwestAir',
'@VirginAmerica',
'@united',
'@JetBlue',
'@USAirways',
'@AmericanAir'
]
text = """\
@AmericanAir @JetBlue
@JetBlue
@SouthwestAir
@united @SouthwestAir
@USAirways
@VirginAmerica
"""
df = pd.DataFrame(dict(text=text.splitlines()))
df
text
0 @AmericanAir @JetBlue
1 @JetBlue
2 @SouthwestAir
3 @united @SouthwestAir
4 @USAirways
5 @VirginAmerica