我有一个列表,它基本上包含.txt文件中的所有字符串。现在我想清除所有逗号,点,感叹号等列表中的字符串。
我尝试了这段代码,但它不起作用:
r = ""
import string
def find_word(filepath,word):
doc = open(filepath, 'r')
for line in doc:
words = string.split(line) ##line.split() causes the same error
words = [w.replace(["'",'`', '[',']','{','}','(', ')', ':', ',', '.', '!', '?', '"', ';'],"") for w in words]
print words
find_word("pg844.txt","eBook")
追溯:
line 11, in find_word
words = [w.replace(["'",'`', '[',']','{','}','(', ')', ':', ',', '.', '!', '?', '"', ';'],"") for w in words]
TypeError: expected a character buffer object
答案 0 :(得分:3)
split
是string的方法,它返回通过将源字符串拆分为某些值而生成的字符串数组(默认为空格),所以你应该这样做:
words = line.split()
您可以使用regexp删除字符:
words = [re.sub('[\W_]+', '', w) for w in words]
或没有正则表达式:
words = [''.join(s for s in w if s.isalnum()) for w in words]
您无法通过replace
方法传递列表。
答案 1 :(得分:0)
import string
s = "username:! test,:?"
s = ''.join([ c for c in s if not c in string.punctuation])
print(s)
username test
答案 2 :(得分:0)
尝试:
def find_word(filepath,word):
def reg(w):
if w.isalnum() or w == ' ': return w;
else: return '';
doc = open(filepath, 'r')
lines = doc.readlines()
for line in lines:
words = ''.join(map(lambda w: ere(w), line))
print words
答案 3 :(得分:0)
import string import re def find_word(filepath): word = "" doc = open(filepath, 'r') for line in doc: word = (re.sub('[,.!:;?]', '', str(line))) print word find_word("pg844.txt")