从包含"!"等字符的文件文本开始或"," (基本上,整个string.punctuation集)我想删除它们并获得只包含所有单词的文本。 在这里,我找到了一个解决方案:https://gomputor.wordpress.com/2008/09/27/search-replace-multiple-words-or-characters-with-python/,我用这种方式编写了脚本:
import string
dict={}
for elem in string.punctuation:
dict[elem]=""
def replace_all(text, dic):
for i, j in dic.items():
text = text.replace(i, j)
return text
with open ("text.txt","r") as f:
file = f.read()
f = replace_all(file,dict)
print(f)
确定这有效,但如果我尝试另一个解决方案,我将不会有相同的结果:
with open ("text.txt","r") as f:
file = f.read()
for elem in string.punctuation:
if elem in file:
f=file.replace(elem,"")
在这种情况下,如果我输入print(f)我有完全相同的文件与所有标点符号。为什么呢?
答案 0 :(得分:1)
我使用过滤器来搜索和替换多个项目:
import string
testString = "Hello, world!"
print(str(filter(lambda a: a not in string.punctuation, testString)))
如果要删除所有非字母数字字符,最好使用正则表达式:
import string, re
testString = "Hello, world!"
print(re.sub("[^\w ]", "", testString))
两个主要问题:
f
而不是file
。file
,因此我添加了行print(file)
新守则:
import string
with open ("text.txt","r") as f:
file = f.read()
for elem in string.punctuation:
if elem in file:
file=file.replace(elem,"")
print(file)