Question

我从Python输出文本文件时遇到了令人沮丧的问题。实际上，当在文本编辑器中打开时，文件看起来完全正常，但是我将这些文件上传到QDA矿工，一个数据分析套件，一旦上传到QDA矿工，这就是文本的样子：

.Â â€¨â€¨"This problem really needs to be focused in a way that is particular to its cultural dynamics and tending in the industry,"

正如你所看到的，许多这些奇怪的（â€œ）符号出现在整个文本中。我的python脚本最初解析的文本是一个RTF文件，我使用OSX的内置文本编辑器将其转换为纯文本。

有没有简单的方法来删除这些符号？我正在解析单个100 + mb文本文件并将它们分成数千个单独的文章，我必须有一种批量转换它们的方法，否则它几乎是不可能的。我还要提一下，这些文本文件的来源是从网页上复制的。

以下是我写的脚本中的一些相关代码：

test1 = filedialog.askopenfile()
newFolder = ((str(test1)[25:])[:-32])
folderCreate(newFolder)
masterFileName = newFolder+"/"+"MASTER_FILE"
masterOutput = open(masterFileName,"w")
edit = test1.readlines()
for i,line in enumerate(edit):
    for j in line.split():
        if j in ["Author","Author:"]:
            try:
                outputFileName = "-".join(edit[i-2].lower().title().split())+".txt"
                output = open(newFolder+"/"+outputFileName,"w") # create file with article name # backslashed changed to front slash windows
                print("File created - ","-".join(edit[i-2].lower().title().split()))
                counter2 = counter2+1
            except:
                print("Filename error.")
                counter = counter+1
                pass


            #Count number of words in each article
            wordCount = 0
            for word in edit[i+1].split():
                wordCount+=1
            fileList.append((outputFileName,str(wordCount)))

            #Now write to file
            output.write(edit[i-2])
            output.write("\n")
            author = line
            output.write(author) # write article author
            output.write("\n")
            output.write("\n")
            content = edit[i+1]
            output.write(content) # write article content

由于

奇怪的符号/编码出现在输出Python txt文件中

0 个答案: