Question

当使用字典替换数据帧中的值时，Pandas会给出一个奇怪的输出：

import pandas as pd

df = pd.read_csv('data.csv')
print(df)
Course
English 21st Century
Maths in the Golden Age of History
Science is cool


Mapped_Items = ['Math', 'English', 'Science', 'History']

pat = '|'.join(r"\b{}\b".format(x) for x in Mapped_Items)
df['Interest'] = df['Course].str.findall('('+ pat + ')').str.join(', ')

mapped_dict = {'English' : 'Eng', 'Science' : 'Sci', 'Math' : 'Mat', 'History' : 'Hist'}
df['Interest'] = df1['Interest'].replace(mapped_dict, inplace=False)

我得到了什么：

print(df)
df
Course                                Interest
English 21st Century                  Engg
Maths in the Golden Age of History    MatttHistt
Science is cool                       Scii

我所追求的是接近以下内容：

 Course                               Interests
English 21st Century                  Eng
Maths in the Golden Age of History    Mat, Hist
Science is cool                       Sci

Answer 1

你的逻辑似乎过于复杂。您不需要正则表达式，pd.Series.replace对字典效率低，即使它可以在一系列列表上工作。这是另一种方法：

import pandas as pd
from io import StringIO

mystr = StringIO("""Course
English 21st Century
Maths in the Golden Age of History
Science is cool""")

df = pd.read_csv(mystr)

d = {'English' : 'Eng', 'Science' : 'Sci', 'Math' : 'Mat', 'History' : 'Hist'}

df['Interest'] = df['Course'].apply(lambda x: ', '.join([d[i] for i in d if i in x]))

print(df)

                               Course   Interest
0                English 21st Century        Eng
1  Maths in the Golden Age of History  Mat, Hist
2                     Science is cool        Sci

熊猫替换给了我一个奇怪的错误

1 个答案: