我有几百个书籍的文本文件(file001.txt,file002.txt等),我想读取每个文件的前3,000个字并将其另存为新文件(例如file001_first3k.txt,file002_first3k.txt) 。
我已经看到了Mac和Linux的终端解决方案(我都有),但是它们似乎是用于显示到终端窗口并设置一定数量的字符而不是单词。
在Python上发布此内容似乎是因为它比终端更容易找到解决方案,并且我对Python有一定的经验。
答案 0 :(得分:1)
希望这会帮助您入门,它假设可以通过空格分隔以确定单词数。
import os
import sys
def extract_first_3k_words(directory):
original_file_suffix = ".txt"
new_file_suffix = "_first3k.tx"
filenames = [f for f in os.listdir(directory)
if f.endswith(original_file_suffix) and not f.endswith(new_file_suffix)]
for filename in filenames:
with open(filename, "r") as original_file:
# Get the first 3k words of the file
num_words = 3000
file_content = original_file.read()
words = file_content.split(" ")
first_3k_words = " ".join(words[:num_words])
# Write the new file
new_filename = filename.replace(original_file_suffix, new_file_suffix)
with open(new_filename, "w") as new_file:
new_file.write(first_3k_words)
print "Extracted 3k words from: %s to %s" % (filename, new_filename)
if __name__ == "__main__":
if len(sys.argv) != 2:
print "Usage: python file_splitter.py <target_directory>"
exit()
directory = sys.argv[1]
extract_first_3k_words(directory)