熊猫替换给了我一个奇怪的错误

时间:2018-06-05 15:56:35

标签: python pandas dataframe

当使用字典替换数据帧中的值时,Pandas会给出一个奇怪的输出:

import pandas as pd

df = pd.read_csv('data.csv')
print(df)
Course
English 21st Century
Maths in the Golden Age of History
Science is cool


Mapped_Items = ['Math', 'English', 'Science', 'History']

pat = '|'.join(r"\b{}\b".format(x) for x in Mapped_Items)
df['Interest'] = df['Course].str.findall('('+ pat + ')').str.join(', ')

mapped_dict = {'English' : 'Eng', 'Science' : 'Sci', 'Math' : 'Mat', 'History' : 'Hist'}
df['Interest'] = df1['Interest'].replace(mapped_dict, inplace=False)

我得到了什么:

print(df)
df
Course                                Interest
English 21st Century                  Engg
Maths in the Golden Age of History    MatttHistt
Science is cool                       Scii

我所追求的是接近以下内容:

 Course                               Interests
English 21st Century                  Eng
Maths in the Golden Age of History    Mat, Hist
Science is cool                       Sci

1 个答案:

答案 0 :(得分:3)

你的逻辑似乎过于复杂。您不需要正则表达式,pd.Series.replace对字典效率低,即使它可以在一系列列表上工作。这是另一种方法:

import pandas as pd
from io import StringIO

mystr = StringIO("""Course
English 21st Century
Maths in the Golden Age of History
Science is cool""")

df = pd.read_csv(mystr)

d = {'English' : 'Eng', 'Science' : 'Sci', 'Math' : 'Mat', 'History' : 'Hist'}

df['Interest'] = df['Course'].apply(lambda x: ', '.join([d[i] for i in d if i in x]))

print(df)

                               Course   Interest
0                English 21st Century        Eng
1  Maths in the Golden Age of History  Mat, Hist
2                     Science is cool        Sci