我有一个数据框('sp500news'),如下所示:
date_publish \
79944 2007-01-29 19:08:35
181781 2007-12-14 19:39:06
213175 2008-01-22 11:17:19
93554 2008-01-22 18:52:56
...
title
79944 Microsoft Vista corporate sales go very well
181781 Williams No Anglican consensus on Episcopal Church
213175 CSX quarterly profit rises
93554 Citigroup says 30 bln capital helps exceed target
...
我正在尝试通过df的“符号”列(“组成部分”)中的相应代码更新每个公司名称,如下所示:
Symbol Name Sector
0 MMM 3M Industrials
1 AOS A.O. Smith Industrials
2 ABT Abbott Health Care
3 ABBV AbbVie Health Care
...
116 C Citigroup Financials
...
我已经尝试过:
for item in sp500news['title']:
for word in item:
if word in constituents['Name']:
indx = constituents['Name'].index(word)
str.replace(word, constituents['Symbol'][indx])
答案 0 :(得分:1)
尝试一下:
以下是代表您的数据的虚拟数据框
df1 = pd.DataFrame({'Symbol': ['MV', 'AOS','ABT'],
'Name': ['Microsoft Vista', 'A.0.', 'Abbot']})
df1
Symbol Name
0 MV Microsoft Vista
1 AOS A.0.
2 ABT Abbot
df2 = pd.DataFrame({'title': [79944, 181781, 213175],
'comment': ['Microsoft Vista corporate sales go very well',
'Abbot consensus on Episcopal Church',
'A.O. says 30 bln captial helps exceed target']})
title comment
0 79944 Microsoft Vista corporate sales go very well
1 181781 Abbot consensus on Episcopal Church
2 213175 A.O. says 30 bln captial helps exceed target
制作一个值字典,将名称映射到它们各自的符号
rep = dict(zip(df1.Name,df1.Symbol))
rep
{'Microsoft Vista': 'MV', 'A.0.': 'AOS', 'Abbot': 'ABT'}
使用Series.replace方法替换它们
df2['comment'] = df2['comment'].replace(rep, regex = True)
df2
title comment
0 79944 MV corporate sales go very well
1 181781 ABT consensus on Episcopal Church
2 213175 A.O. says 30 bln captial helps exceed target
答案 1 :(得分:0)
尝试以下代码
df = pd.DataFrame({'title': ['Citigroup says 30 bln capital helps exceed target',
'Williams No Anglican consensus on Episcopal Church',
'Microsoft Vista corporate sales go very well']})
constituents = pd.DataFrame({'symbol': ['MMM', 'C', 'MCR', 'WLM'],
'name': ['3M', 'Citigroup', 'Microsoft', 'Williams']})
for name, symbol in zip(constituents['name'], constituents['symbol']):
df['title'] = df['title'].str.replace(name, symbol)
输出
title
0 C says 30 bln capital helps exceed target
1 WLM No Anglican consensus on Episcopal Church
2 MCR Vista corporate sales go very well
我基本上只是复制了sp500news['title]
的几行并组成了constituents['Name']
的一部分,只是为了演示转换。本质上,我正在从pd.Series
访问列title
的{{1}}对象的sp500news
对象的字符串方法对象,因此当找到匹配的公司名称时,可以对其应用replace