清理字符串列表

时间:2015-10-20 12:31:49

标签: python string list

我有一个列表,它基本上包含.txt文件中的所有字符串。现在我想清除所有逗号,点,感叹号等列表中的字符串。

我尝试了这段代码,但它不起作用:

r = ""
import string

def find_word(filepath,word):
    doc = open(filepath, 'r')

    for line in doc:
        words = string.split(line) ##line.split() causes the same error
        words = [w.replace(["'",'`', '[',']','{','}','(', ')', ':', ',', '.', '!', '?', '"', ';'],"") for w in words]
        print words

find_word("pg844.txt","eBook")

追溯:

line 11, in find_word
    words = [w.replace(["'",'`', '[',']','{','}','(', ')', ':', ',', '.', '!', '?', '"', ';'],"") for w in words]
TypeError: expected a character buffer object

4 个答案:

答案 0 :(得分:3)

split是string的方法,它返回通过将源字符串拆分为某些值而生成的字符串数组(默认为空格),所以你应该这样做:

words = line.split()

您可以使用regexp删除字符:

words = [re.sub('[\W_]+', '', w) for w in words]

或没有正则表达式:

words = [''.join(s for s in w if s.isalnum()) for w in words]

您无法通过replace方法传递列表。

答案 1 :(得分:0)

import string

s = "username:! test,:?"

s = ''.join([ c for c in s if not c in string.punctuation])

print(s)

username test

答案 2 :(得分:0)

尝试:

def find_word(filepath,word):
     def reg(w):
        if w.isalnum() or w == ' ': return w;
        else: return '';
    doc = open(filepath, 'r')
    lines = doc.readlines()
    for line in lines:
        words = ''.join(map(lambda w: ere(w), line))
        print words

答案 3 :(得分:0)

import string
import re

def find_word(filepath):
    word = ""
    doc = open(filepath, 'r')
    for line in doc:
        word = (re.sub('[,.!:;?]', '', str(line)))
        print word

find_word("pg844.txt")