我有一个看起来像这样的数据集,
"See the new #Gucci 5th Ave NY windows customized by @troubleandrew for the debut of the #GucciGhost collection."
"Before the #GucciGhost collection debuts tomorrow, read about the artist @troubleandrew"
所以我试图摆脱所有@和附加的词。我的数据集看起来应该是这样的。
"See the new #Gucci 5th Ave NY windows customized by for the debut of the #GucciGhost collection."
"Before the #GucciGhost collection debuts tomorrow, read about the artist"
所以我可以使用简单的替换语句来摆脱@
。但相邻的词是一个问题。
我正在使用re来搜索/查找事件。但我无法删除这个词。
P.S - 如果只是一个单词,那就不会有问题了。但是我的数据集中有多个单词附加到@
答案 0 :(得分:2)
您可以使用正则表达式
import re
a = [
"See the new #Gucci 5th Ave NY windows customized by @troubleandrew for the debut of the #GucciGhost collection.",
"Before the #GucciGhost collection debuts tomorrow, read about the artist @troubleandrew"
]
pat = re.compile(r"@\S+") # \S+ all non-space characters
for i in range(len(a)):
a[i] = re.sub(pat, "", a[i]) # replace it with empty string
print a
这会给你你想要的东西。
答案 1 :(得分:0)
惯用版,替代额外空间:
import re
a = [
"See the new #Gucci 5th Ave NY windows customized by @troubleandrew for the debut of the #GucciGhost collection.",
"Before the #GucciGhost collection debuts tomorrow, read about the artist @troubleandrew"
]
rgx = re.compile(r"\s?@\S+")
b = [ re.sub(rgx, "", row) for row in a ]
print b
\s?
:\s
匹配' '
,?
代表zero or one
出现。