当使用字典替换数据帧中的值时,Pandas会给出一个奇怪的输出:
import pandas as pd
df = pd.read_csv('data.csv')
print(df)
Course
English 21st Century
Maths in the Golden Age of History
Science is cool
Mapped_Items = ['Math', 'English', 'Science', 'History']
pat = '|'.join(r"\b{}\b".format(x) for x in Mapped_Items)
df['Interest'] = df['Course].str.findall('('+ pat + ')').str.join(', ')
mapped_dict = {'English' : 'Eng', 'Science' : 'Sci', 'Math' : 'Mat', 'History' : 'Hist'}
df['Interest'] = df1['Interest'].replace(mapped_dict, inplace=False)
我得到了什么:
print(df)
df
Course Interest
English 21st Century Engg
Maths in the Golden Age of History MatttHistt
Science is cool Scii
我所追求的是接近以下内容:
Course Interests
English 21st Century Eng
Maths in the Golden Age of History Mat, Hist
Science is cool Sci
答案 0 :(得分:3)
你的逻辑似乎过于复杂。您不需要正则表达式,pd.Series.replace
对字典效率低,即使它可以在一系列列表上工作。这是另一种方法:
import pandas as pd
from io import StringIO
mystr = StringIO("""Course
English 21st Century
Maths in the Golden Age of History
Science is cool""")
df = pd.read_csv(mystr)
d = {'English' : 'Eng', 'Science' : 'Sci', 'Math' : 'Mat', 'History' : 'Hist'}
df['Interest'] = df['Course'].apply(lambda x: ', '.join([d[i] for i in d if i in x]))
print(df)
Course Interest
0 English 21st Century Eng
1 Maths in the Golden Age of History Mat, Hist
2 Science is cool Sci