从多个文本文件中读取固定数量的单词,然后另存为新文件

时间:2018-07-28 12:36:20

标签: python

我有几百个书籍的文本文件(file001.txt,file002.txt等),我想读取每个文件的前3,000个字并将其另存为新文件(例如file001_first3k.txt,file002_first3k.txt) 。

我已经看到了Mac和Linux的终端解决方案(我都有),但是它们似乎是用于显示到终端窗口并设置一定数量的字符而不是单词。

在Python上发布此内容似乎是因为它比终端更容易找到解决方案,并且我对Python有一定的经验。

1 个答案:

答案 0 :(得分:1)

希望这会帮助您入门,它假设可以通过空格分隔以确定单词数。

import os
import sys

def extract_first_3k_words(directory):
    original_file_suffix = ".txt"
    new_file_suffix = "_first3k.tx"
    filenames = [f for f in os.listdir(directory)
        if f.endswith(original_file_suffix) and not f.endswith(new_file_suffix)]

    for filename in filenames:
        with open(filename, "r") as original_file:

            # Get the first 3k words of the file
            num_words = 3000
            file_content = original_file.read()
            words = file_content.split(" ")
            first_3k_words = " ".join(words[:num_words])

            # Write the new file
            new_filename = filename.replace(original_file_suffix, new_file_suffix)
            with open(new_filename, "w") as new_file:
                new_file.write(first_3k_words)

            print "Extracted 3k words from: %s to %s" % (filename, new_filename)

if __name__ == "__main__":
    if len(sys.argv) != 2:
        print "Usage: python file_splitter.py <target_directory>"
        exit()
    directory = sys.argv[1]
    extract_first_3k_words(directory)