我正在使用python 3.6.8
和pandas
。
我正在加载csv文件,并尝试用其他字符串替换其中一个列中的字符串。
import pandas as pd
INPUT_FILE = "input.csv"
df = pd.read_csv(INPUT_FILE, error_bad_lines=False, engine='python')
print(df.columns)
print ("Before: ", df['tweet'].loc[432])
dic = {":-)": "happy-smiley",
":)": "happy-smiley",
":-(": "sad-smiley",
":(": "sad-smiley"}
df.replace({'tweet': dic}, inplace=True)
print ("After: ", df['tweet'].loc[432])
输出:
Index(['tweet', 'existence', 'existence.confidence'], dtype='object')
Before: Are you ready for climate change, if so let your lawmakers know, how tell them sign petitions, drop a hint :)
After: Are you ready for climate change, if so let your lawmakers know, how tell them sign petitions, drop a hint :)
但是正如您所看到的,我得到的结果相同(“ :)”字符串不会随着“ happy-smiley”改变)。
我想念什么?
答案 0 :(得分:2)
由于在字典的键中使用了特殊的正则表达式值,因此可以在替换之前将其转义,并添加regex=True
来替换子字符串:
import re
dic = {re.escape(k):v for k, v in dic.items()}
print (dic)
{':\\-\\)': 'happy-smiley',
':\\)': 'happy-smiley',
':\\-\\(': 'sad-smiley',
':\\(': 'sad-smiley'}
df.replace({'tweet': dic}, inplace=True, regex=True)