我正在尝试从文本中删除无效符号。我有这段代码:
def parse_documentation(filename):
filename=open(filename)
invalidsymbols=["`","~","!", "@","#","$"]
for lines in filename:
print(lines)
for word in lines:
print(word)
for letter in word:
if invalidsymbols==letter:
print(letter)
首先我只是通过打印这封信来测试它,然后我会添加代码来删除它(del())。我有比列表中的符号更多的无效符号但它很多,所以我想检查使用只有5或6.我遇到的问题是它不仅打印无效符号,而且打印我文本中的所有字母。此外,由于某种原因,它也会在我的文本之前打印额外的字符。我该如何解决这个问题?
我正在使用的文字是:
he's a jolly good fellow#
I want pizza!
I'm driving to school$
答案 0 :(得分:3)
您可以使用str.translate
一起删除不需要的符号:
>>> txt = """he's a jolly good fellow#
... I want pizza!
... I'm driving to school$"""
>>> print txt.translate(None, "`~!@#$")
he's a jolly good fellow
I want pizza
I'm driving to school
因此您的代码可能类似于
def parse_documentation(filename, invalid_symbols):
symb_to_remove = ''.join(invalid_symbols)
with open(filename, 'rb') as in_file:
for line in in_file:
safe_line = line.translate(None, symb_to_remove)
<here comes code to do smthng with safe_line>
您将使用
调用此函数parse_documentation(filename, ["`","~","!", "@","#","$"])
答案 1 :(得分:0)
def parse_documentation(filename):
filename=open(filename, "r") # open file
lines = filename.read(); # read all the lines in the file to a list named as "lines"
invalidsymbols=["`","~","!", "@","#","$"]
for line in lines: # for each line in lines
for x in invalidsymbols: # loop through the list of invalid symbols
if x in line: # if the invalid symbols is in the line
print(line) # print out the line
print(x) # and also print out the invalid symbol you encountered in that line
print(line.replace(x, "")) # print out a line with invalid symbol removed
那怎么样?
答案 2 :(得分:0)
JoeC已经回答了,但我想补充一点,如果您的无效符号在该行中多次出现,那么您可能最好不要执行以下操作
def parse_documentation(filename):
filelines = open(filename)
invalidsymbols=["`","~","!", "@","#","$"]
for line in filelines:
print(lines)
for symbol in invalidsymbols:
if symbol in line:
print("Above line contains %s symbol"%symbol)
关于替换符号,请参阅JoeC's answer。
答案 3 :(得分:0)
尝试使用 textcleaner 库执行此任务。
请通过以下链接获取首页和文档:https://pypi.org/project/textcleaner/
调用remove_symbols函数,它将返回纯文本。它仅使用正则表达式。
功能说明:
https://yugantm.github.io/textcleaner/documentation.html#remove_symbols