所以我有一些代码可以打开一个文本文件,其中包含如下所示的文件路径列表:
C:/用户/用户/桌面/ mini_mouse / 1980
C:/用户/用户/桌面/ mini_mouse / 1982
C:/用户/用户/桌面/ mini_mouse / 1984
然后逐行分别打开这些文件,并对文件进行一些过滤。然后,我希望它将结果输出到一个完全不同的文件夹:
output_location = 'C:/Users/User/Desktop/test2/'
按现状,我的代码当前将结果输出到打开原始文件的位置,即,如果打开文件C:/ Users / User / Desktop / mini_mouse / 1980,则输出将位于以下目录的同一文件夹中名称“ 1980_filtered”。但是,我希望输出进入output_location。谁能看到我目前要出问题的地方?任何帮助将不胜感激!这是我的代码:
import os
def main():
stop_words_path = 'C:/Users/User/Desktop/NLTK-stop-word-list.txt'
stopwords = get_stop_words_list(stop_words_path)
output_location = 'C:/Users/User/Desktop/test2/'
list_file = 'C:/Users/User/Desktop/list_of_files.txt'
with open(list_file, 'r') as f:
for file_name in f:
#print(file_name)
if file_name.endswith('\n'):
file_name = file_name[:-1]
#print(file_name)
file_path = os.path.join(file_name) # joins the new path of the file to the current file in order to access the file
filestring = '' # file string which will take all the lines in the file and add them to itself
with open(file_path, 'r') as f2: # open the file
print('just opened ' + file_name)
print('\n')
for line in f2: # read file line by line
x = remove_stop_words(line, stopwords) # remove stop words from line
filestring += x # add newly filtered line to the file string
filestring += '\n' # Create new line
new_file_path = os.path.join(output_location, file_name) + '_filtered' # creates a new file of the file that is currenlty being filtered of stopwords
with open(new_file_path, 'a') as output_file: # opens output file
output_file.write(filestring)
if __name__ == "__main__":
main()
答案 0 :(得分:1)
假设您使用的是Windows(因为您具有正常的Windows文件系统),则必须在路径名中使用反斜杠。请注意,这仅在Windows上。我知道这很烦人,所以我为您更改了(不客气:))。您还必须使用两个反斜杠,因为它将尝试将其用作转义符。
import os
def main():
stop_words_path = 'C:\\Users\\User\\Desktop\\NLTK-stop-word-list.txt'
stopwords = get_stop_words_list(stop_words_path)
output_location = 'C:\\Users\\User\\Desktop\\test2\\'
list_file = 'C:\\Users\\User\\Desktop\\list_of_files.txt'
with open(list_file, 'r') as f:
for file_name in f:
#print(file_name)
if file_name.endswith('\n'):
file_name = file_name[:-1]
#print(file_name)
file_path = os.path.join(file_name) # joins the new path of the file to the current file in order to access the file
filestring = '' # file string which will take all the lines in the file and add them to itself
with open(file_path, 'r') as f2: # open the file
print('just opened ' + file_name)
print('\n')
for line in f2: # read file line by line
x = remove_stop_words(line, stopwords) # remove stop words from line
filestring += x # add newly filtered line to the file string
filestring += '\n' # Create new line
new_file_path = os.path.join(output_location, file_name) + '_filtered' # creates a new file of the file that is currenlty being filtered of stopwords
with open(new_file_path, 'a') as output_file: # opens output file
output_file.write(filestring)
if __name__ == "__main__":
main()
答案 1 :(得分:1)
根据您的代码,这似乎是该行中的一个问题:
new_file_path = os.path.join(output_location, file_name) + '_filtered'
在Python的 os.path.join()中,输入中的任何绝对路径(Windows中的驱动器号)都会丢弃之前的所有内容,并从新的绝对路径(或驱动器号)重新开始联接)。由于您是直接从 list_of_files.txt 调用 file_name 的,并且已相对于C:盘格式化了每个路径,因此每次调用 os.path.join ()正在删除 output_location 并将其重置为原始文件路径。
有关此行为的详细说明,请参见Why doesn't os.path.join() work in this case?。
构建输出路径时,可以从路径“ C:/ Users / User / Desktop / mini_mouse / 1980”中剥离文件名“ 1980”,然后根据 output_location 变量和隔离的文件名。