Question

我有一个由包含字符串的数据系列组成的数据框。我有一个希望从每一行中删除的字符串列表。

tcl_list = ["tab", "cr", "lf", "doublequote", "singlequote", "eof"]
df[['Summary', 'Description']] = re.sub("|".join(tcl_list), ' ', df[['Summary', 'Description']])

例如：

从此：

the tab dog is acting sneaky like a doublequote cat doublequote

对此：

the dog is acting sneaky like a cat

但是，出现此错误：

TypeError: expected string or bytes-like object

我尝试使用apply（）和lambda函数，但未成功。有什么建议吗？

Answer 1

我认为正则表达式需要应用于列的单个字符串

df['val'] = ['the tab dog is acting sneaky like a doublequote cat doublequote']

df.val.apply(lambda x: re.sub("|".join(tcl_list),'',x))

或

df.val.str.replace("|".join(tcl_list),'')

出局：

0    the  dog is acting sneaky like a  cat 
Name: val, dtype: object

从数据框中删除单词列表

1 个答案: