Question

从包含＆＃34;！＆＃34;等字符的文件文本开始或＆＃34;，＆＃34; （基本上，整个string.punctuation集）我想删除它们并获得只包含所有单词的文本。在这里，我找到了一个解决方案：https://gomputor.wordpress.com/2008/09/27/search-replace-multiple-words-or-characters-with-python/，我用这种方式编写了脚本：

import string

dict={}
for elem in string.punctuation:
    dict[elem]=""

def replace_all(text, dic):
    for i, j in dic.items():
        text = text.replace(i, j)
    return text

with open ("text.txt","r") as f:
    file = f.read()
    f = replace_all(file,dict)

print(f)

确定这有效，但如果我尝试另一个解决方案，我将不会有相同的结果：

with open ("text.txt","r") as f:
    file = f.read()
    for elem in string.punctuation:
        if elem in file:
            f=file.replace(elem,"")

在这种情况下，如果我输入print（f）我有完全相同的文件与所有标点符号。为什么呢？

Answer 1

我使用过滤器来搜索和替换多个项目：

import string
testString = "Hello, world!"
print(str(filter(lambda a: a not in string.punctuation, testString)))

正则表达式

如果要删除所有非字母数字字符，最好使用正则表达式：

import string, re
testString = "Hello, world!"
print(re.sub("[^\w ]", "", testString))

为什么你的代码不起作用

两个主要问题：

您重新分配f而不是file。
您未打印file，因此我添加了行print(file)

新守则：

import string

with open ("text.txt","r") as f:
    file = f.read()
    for elem in string.punctuation:
        if elem in file:
            file=file.replace(elem,"")
    print(file)

python：替换方法，两种不同的解决方案

1 个答案:

正则表达式

为什么你的代码不起作用