Question

我正在尝试使用txt文件替换数据框中列中的某些字符串。

我的数据框如下所示。

coffee_directions_df

Utterance                         Frequency   

Directions to Starbucks           1045
Directions to Tullys              1034
Give me directions to Tullys      986
Directions to Seattles Best       875
Show me directions to Dunkin      812
Directions to Daily Dozen         789
Show me directions to Starbucks   754
Give me directions to Dunkin      612
Navigate me to Seattles Best      498
Display navigation to Starbucks   376
Direct me to Starbucks            201

DF显示人们的言语和话语的频率。

即，“前往星巴克的路线”被发出1045次。

据我所知，我可以创建一个字典来替换“Starbucks”，“Tullys”和“Seattles Best”之类的字符串，如下所示：

# define dictionary of mappings
rep_dict = {'Starbucks': 'Coffee', 'Tullys': 'Coffee', 'Seattles Best': 'Coffee'}

# apply substring mapping

df['Utterance'] = df['Utterance'].replace(rep_dict, regex=True).str.lower()

但是，我的数据框非常大，我想知道是否有一种方法可以将rep_dict保存为.txt文件，导入.txt文件，然后应用或映射其中的单词txt文件到coffee_directions_df.Utterance

最终，我不想在脚本中创建一堆字典，并且能够导入包含这些字典的txt文件。

谢谢！

Answer 1

我的意思是这么简单：

import pandas as pd

data = '''\
Starbucks,Coffee
Tullys,Coffee
Seattles Best,Coffee'''

# Create a map from a file 
m = pd.read_csv(pd.compat.StringIO(data), header=None, index_col=[0])[1]

然后：

df['Utterance'] = df['Utterance'].replace(m, regex=True).str.lower()

导入txt文件以替换数据帧中的某些字符串（pandas）

1 个答案: