通过字典重新分配pandas col对原始DataFrame没有影响?

时间:2016-07-20 20:42:30

标签: python pandas

我有一个巨大的pandas DataFrame看起来像这样(示例):

df = pd.DataFrame({"col1":{0:"There ARE NO ERRORS!!!", 1:"EVERYTHING is failing", 2:"There ARE NO ERRORS!!!"}, "col2":{0:"WE HAVE SOME ERRORS", 1:"EVERYTHING is failing", 2:"System shutdown!"}})

我有一个名为cleanMessage的函数,用于去除标点符号并返回小写字符串。例如,cleanMessage("THERE may be some errors, I don't know!!")将返回there may be some errors i dont know

我正在尝试将col1中的每条消息替换为该特定消息的任何cleanMessage返回(基本上清理这些消息列)。 pd.DataFrame.iterrows对我来说很好,但有点慢。我正在尝试将新值映射到原始df中的键,如下所示:

message_set = set(df["col1"])
message_dict = dict((original, cleanMessage(original)) for original in message_set)
df = df.replace("col1", message_dict)

所以原来的df希望:

>>> df
    col1                      col2
0   "There ARE NO ERRORS"     "WE HAVE SOME ERRORS"
1   "EVERYTHING is failing"   "EVERYTHING is failing"
2   "There ARE NO ERRORS!!!"  "System shutdown!"

“之后”df应如下所示:

>>> df
    col1                      col2
0   "there are no errors"     "WE HAVE SOME ERRORS"
1   "everything is failing"   "EVERYTHING is failing"
2   "there are no errors"     "System shutdown!"

我错过了代码中replace部分的内容吗?

编辑:

对于未来的观众,这是我开始工作的代码:

df["col1"] = df["col1"].map(message_dict)

2 个答案:

答案 0 :(得分:1)

replace适用于regex - 考虑将clean message()的逻辑放入嵌套replace()

df["col2"] = df["col1"].replace(...).replace(...)

答案 1 :(得分:0)

df.col1 = df.col1.str.lower().str.replace(r'([^a-z ])', '')

df

enter image description here