Question

我有一个列表，它基本上包含.txt文件中的所有字符串。现在我想清除所有逗号，点，感叹号等列表中的字符串。

我尝试了这段代码，但它不起作用：

r = ""
import string

def find_word(filepath,word):
    doc = open(filepath, 'r')

    for line in doc:
        words = string.split(line) ##line.split() causes the same error
        words = [w.replace(["'",'`', '[',']','{','}','(', ')', ':', ',', '.', '!', '?', '"', ';'],"") for w in words]
        print words

find_word("pg844.txt","eBook")

追溯：

line 11, in find_word
    words = [w.replace(["'",'`', '[',']','{','}','(', ')', ':', ',', '.', '!', '?', '"', ';'],"") for w in words]
TypeError: expected a character buffer object

Answer 1

split是string的方法，它返回通过将源字符串拆分为某些值而生成的字符串数组（默认为空格），所以你应该这样做：

words = line.split()

您可以使用regexp删除字符：

words = [re.sub('[\W_]+', '', w) for w in words]

或没有正则表达式：

words = [''.join(s for s in w if s.isalnum()) for w in words]

您无法通过replace方法传递列表。

Answer 2

import string

s = "username:! test,:?"

s = ''.join([ c for c in s if not c in string.punctuation])

print(s)

username test

Answer 3

尝试：

def find_word(filepath,word):
     def reg(w):
        if w.isalnum() or w == ' ': return w;
        else: return '';
    doc = open(filepath, 'r')
    lines = doc.readlines()
    for line in lines:
        words = ''.join(map(lambda w: ere(w), line))
        print words

Answer 4

import string
import re

def find_word(filepath):
    word = ""
    doc = open(filepath, 'r')
    for line in doc:
        word = (re.sub('[,.!:;?]', '', str(line)))
        print word

find_word("pg844.txt")

清理字符串列表

4 个答案: