Question

所以我正在创建一个程序来读取pdf到文本文件，但每次运行我的代码时，新行字符都会弹出文本文件的列表。我尝试了很多方法，包括strip（），split（）和replace（），但这些字符不会消失。如果有人能帮助我，那就太好了。下面发布的片段：

import PyPDF2 as pdf

# creating an object 
file = open(PDF_FILENAME_DIRECTORY, "rb")

# creating a pdf reader object
fileReader = pdf.PdfFileReader(file)

# print the number of pages in pdf file
textData = []

for pages in fileReader.pages:
    theText = pages.extractText()

    # for char in theText:
    #   theText.replace(char, "\n")

    textData.append(theText)

final_list = []

for i in textData:
    final_list.append(i.strip('\n'))

# [s.strip('\n') for s in theText]
# [s.replace('\n', '') for s in theText]


# text_data = []

# for elem in textData:
#         text_data.extend(elem.strip().split('n'))  

# for line in textData:
#     textData.append(line.strip().split('\n'))
#--------------------------------------------------------------------

import os.path

save_path = "FILENAME_SAVEPATH_DIRECTORY"

name_of_file = input("What is the name of the file: ")

completeName = os.path.join(save_path, name_of_file + ".txt")   

file1 = open(completeName, "w")

file1.write(str(final_list))

file1.close()

Sample output of code as a list in a text file. I want to take out the '\n' characters.

Answer 1

你的问题就在这一行：

file1.write(str(final_list))

这会调用__str__类型的list方法，该方法使用repr对列表中的元素进行字符串化，这就是导致输出看起来像它的方式。

请改为：

for line in final_list:
    file1.write(line)

如何从Python中的列表中删除所有换行符

1 个答案: