Question

我一直在研究一个有很多语言的文件，我们需要忽略这些文字，以便我们可以计算单词的实际长度。

示例：

这个堆栈是否溢出！ ---＆GT;这个堆栈是否溢出

在这样做时，我确实为每一个标点符号编写了很多案例，这些标点符号使我的代码工作得很慢。所以我正在寻找一些有效的方法来使用模块或函数来实现它。

代码段：

with open(file_name,'r') as f:
     for line in f:
         for word in line.split():
            #print word
            '''
                Handling Puntuations
            '''
            word = word.replace('.','')
            word = word.replace(',','')
            word = word.replace('!','')
            word = word.replace('(','')
            word = word.replace(')','')
            word = word.replace(':','')
            word = word.replace(';','')
            word = word.replace('/','')
            word = word.replace('[','')
            word = word.replace(']','')
            word = word.replace('-','')

所以形成这个逻辑，我写了这个，所以有什么办法可以减少这个吗？

Answer 1

这个问题是＆＃34;经典＆＃34;，但很多答案都不适用于Python 3，因为maketrans函数已从Python 3中删除。符合Python 3标准解决方案是：

使用string.punctuation获取列表，str.translate删除它们

import string
"hello, world !".translate({ord(k):"" for k in string.punctuation})

结果：

'hello world '

translate的参数是（在Python 3中）一个字典。 Key是字符的ASCII码，value是替换字符。我使用字典理解创建了它。

Answer 2

您可以使用正则表达式将字符类替换为

string foo = "\"NewClient\"Name\"";
foo  = foo.Replace("\"", "");

>>> import re >>> re.sub(r'[]!,:)([/-]', '', string) 'Is this stack overflow'与[]!,:)([/-]或]或!等匹配的字符类。将其替换为,。

如何用有效的代码替换文字中的标记？

2 个答案: