在python2中,如何限制字符串的长度导入目录中的所有 txt 文件?像wordlength = 6000
import glob
raw_text = ""
path = "/workspace/simple/*.txt"
for filename in glob.glob(path):
with open(filename, 'r') as f:
for line in f:
raw_text += line
words = raw_text.split()
print(words)
此代码仅提供所有 txt 文件并在屏幕上打印。如何将其限制为6000个单词并仅打印6000个单词?
答案 0 :(得分:0)
这取决于您对字的定义。如果只是用白色空格分隔文本,那就相当容易了:在过去时计算单词,在有足够数量时停止。例如:
word_limit = 6000
word_count = 0
for line in f:
word_count += len(line.split())
if word_count > word_limit:
break
raw_text += line
如果您希望完全 6000个单词,您可以修改循环以从最后一行获取足够的单词以准确地生成6000个单词。
如果你想让它更有效一点,那么放弃 raw_text 并在循环中构建单词,一次一行,
line_words = line.split()
words.extend(line_words)
在这种情况下,您需要使用len(line_words)进行检查。
答案 1 :(得分:0)
import glob
raw_text = ""
path = "/workspace/simple/*.txt"
for filename in glob.glob(path):
with open(filename, 'r') as f:
for line in f:
if len(raw_text.split())< N: ###here you put your number
raw_text += line
else:
break
words = raw_text.split()
print(words)
答案 2 :(得分:0)
假设您要从每个文件中删除6000个或更少的单词?
import glob, sys
path = sys.argv[1]
count = int(sys.argv[2]) if len(sys.argv) > 2 else 60
words = []
for file in glob.glob(path):
with open(file) as f:
words += f.read().split()[:count]
print(words)
>>>python test.py "/workspace/simple/*.txt" 6000
您还可以为要存档的字词设置字典:
import glob, sys
path = sys.argv[1]
count = int(sys.argv[2]) if len(sys.argv) > 2 else 60
fwords = {}
for file in glob.glob(path):
with open(file) as f:
fwords[file] = f.read().split()[:count]
print(fwords)
如果您只想要包含单词数量的文件
for file in glob.glob(path):
with open(file) as f:
tmp = f.read().split()
if len(tmp) == count : # only the count
fwords[file] = tmp
答案 3 :(得分:0)
尝试用以下代码替换代码:
for filename in glob.glob(path):
with open(filename, 'r') as f:
word_limit = 12000
word_count = 0
for line in f:
word_count += len(line)
if word_count > word_limit:
break
raw_text += line