所以我正在创建一个程序来读取pdf到文本文件,但每次运行我的代码时,新行字符都会弹出文本文件的列表。我尝试了很多方法,包括strip(),split()和replace(),但这些字符不会消失。如果有人能帮助我,那就太好了。下面发布的片段:
import PyPDF2 as pdf
# creating an object
file = open(PDF_FILENAME_DIRECTORY, "rb")
# creating a pdf reader object
fileReader = pdf.PdfFileReader(file)
# print the number of pages in pdf file
textData = []
for pages in fileReader.pages:
theText = pages.extractText()
# for char in theText:
# theText.replace(char, "\n")
textData.append(theText)
final_list = []
for i in textData:
final_list.append(i.strip('\n'))
# [s.strip('\n') for s in theText]
# [s.replace('\n', '') for s in theText]
# text_data = []
# for elem in textData:
# text_data.extend(elem.strip().split('n'))
# for line in textData:
# textData.append(line.strip().split('\n'))
#--------------------------------------------------------------------
import os.path
save_path = "FILENAME_SAVEPATH_DIRECTORY"
name_of_file = input("What is the name of the file: ")
completeName = os.path.join(save_path, name_of_file + ".txt")
file1 = open(completeName, "w")
file1.write(str(final_list))
file1.close()
Sample output of code as a list in a text file. I want to take out the '\n' characters.
答案 0 :(得分:0)
你的问题就在这一行:
file1.write(str(final_list))
这会调用__str__
类型的list
方法,该方法使用repr
对列表中的元素进行字符串化,这就是导致输出看起来像它的方式。
请改为:
for line in final_list:
file1.write(line)