Python如何从文件路径中限制字符串

时间:2017-04-17 18:06:50

标签: python string split

在python2中,如何限制字符串的长度导入目录中的所有 txt 文件?像wordlength = 6000

import glob

raw_text = ""
path = "/workspace/simple/*.txt"

for filename in glob.glob(path):
    with open(filename, 'r') as f:
        for line in f:
            raw_text += line

words = raw_text.split()
print(words)

此代码仅提供所有 txt 文件并在屏幕上打印。如何将其限制为6000个单词并仅打印6000个单词?

4 个答案:

答案 0 :(得分:0)

这取决于您对的定义。如果只是用白色空格分隔文本,那就相当容易了:在过去时计算单词,在有足够数量时停止。例如:

    word_limit = 6000
    word_count = 0
    for line in f:
        word_count += len(line.split())
        if word_count > word_limit:
            break
        raw_text += line

如果您希望完全 6000个单词,您可以修改循环以从最后一行获取足够的单词以准确地生成6000个单词。

如果你想让它更有效一点,那么放弃 raw_text 并在循环中构建单词,一次一行,

        line_words = line.split()
        words.extend(line_words)

在这种情况下,您需要使用len(line_words)进行检查。

答案 1 :(得分:0)

import glob

raw_text = ""
path = "/workspace/simple/*.txt"

for filename in glob.glob(path):
    with open(filename, 'r') as f:
            for line in f:
                if len(raw_text.split())< N:  ###here you put your number
                    raw_text += line
                else:
                    break
words = raw_text.split()
print(words)

答案 2 :(得分:0)

假设您要从每个文件中删除6000个或更少的单词?

import glob, sys

path = sys.argv[1]
count = int(sys.argv[2]) if len(sys.argv) > 2 else 60
words = []

for file in glob.glob(path):
    with open(file) as f: 
        words += f.read().split()[:count]

print(words)

>>>python test.py "/workspace/simple/*.txt" 6000

您还可以为要存档的字词设置字典:

import glob, sys

path = sys.argv[1]
count = int(sys.argv[2]) if len(sys.argv) > 2 else 60
fwords = {}

for file in glob.glob(path):
    with open(file) as f: 
        fwords[file] = f.read().split()[:count]

print(fwords)

如果您只想要包含单词数量的文件

for file in glob.glob(path):
    with open(file) as f: 
        tmp = f.read().split()
        if len(tmp) == count :  # only the count 
            fwords[file] = tmp

答案 3 :(得分:0)

尝试用以下代码替换代码:

for filename in glob.glob(path):
    with open(filename, 'r') as f:
        word_limit = 12000
        word_count = 0
        for line in f:
            word_count += len(line)
            if word_count > word_limit:
                break
            raw_text += line