我从Python输出文本文件时遇到了令人沮丧的问题。实际上,当在文本编辑器中打开时,文件看起来完全正常,但是我将这些文件上传到QDA矿工,一个数据分析套件,一旦上传到QDA矿工,这就是文本的样子:
. 

"This problem really needs to be focused in a way that is particular to its cultural dynamics and tending in the industry,"
正如你所看到的,许多这些奇怪的(“)符号出现在整个文本中。我的python脚本最初解析的文本是一个RTF文件,我使用OSX的内置文本编辑器将其转换为纯文本。
有没有简单的方法来删除这些符号?我正在解析单个100 + mb文本文件并将它们分成数千个单独的文章,我必须有一种批量转换它们的方法,否则它几乎是不可能的。我还要提一下,这些文本文件的来源是从网页上复制的。
以下是我写的脚本中的一些相关代码:
test1 = filedialog.askopenfile()
newFolder = ((str(test1)[25:])[:-32])
folderCreate(newFolder)
masterFileName = newFolder+"/"+"MASTER_FILE"
masterOutput = open(masterFileName,"w")
edit = test1.readlines()
for i,line in enumerate(edit):
for j in line.split():
if j in ["Author","Author:"]:
try:
outputFileName = "-".join(edit[i-2].lower().title().split())+".txt"
output = open(newFolder+"/"+outputFileName,"w") # create file with article name # backslashed changed to front slash windows
print("File created - ","-".join(edit[i-2].lower().title().split()))
counter2 = counter2+1
except:
print("Filename error.")
counter = counter+1
pass
#Count number of words in each article
wordCount = 0
for word in edit[i+1].split():
wordCount+=1
fileList.append((outputFileName,str(wordCount)))
#Now write to file
output.write(edit[i-2])
output.write("\n")
author = line
output.write(author) # write article author
output.write("\n")
output.write("\n")
content = edit[i+1]
output.write(content) # write article content
由于