我正在尝试处理文本文件及其句子。我将文本分成句子并通过一些python函数处理单个句子。
with io.open ("articles.html", encoding="utf-8") as myfile:
data = myfile.read()
data = data.split('\n')
myfile.close()
基本上,我根据每个句子的长度和一些正则表达式过滤器处理每个句子。我已将我的python函数(即process_movie_1,process_movie_2,process_movie_3)存储在其他文件中并导入到主脚本中。
我使用for循环调用每个句子。使用当前结构,我的脚本在for循环中一次处理一个句子/一个函数。我需要修改脚本,以便我可以同时处理每个句子(同时)。我需要你的想法,我可能想从主脚本中调用所有这些函数或将其分叉。我想你们中的一些人可能会想出更好的主意。是的,现在我使用IDE来调用我的主脚本,但我已准备好使用命令提示符同时处理所有句子。我也可以使用任何可以帮助我考虑我的情况的开源软件。
from clip1 import *
from clip2 import *
from clip3 import *
from clip4 import *
for idx, sentence in enumerate(data):
serial = str(idx)
folder = str(idx)
string = str(sentence)
tokens = TextBlob(string)
wordcounts = len(tokens.words)
sep = re.split('; |\*|\n|--', string)
if len(sep) == 2:
a, b = [str(e) for e in sep]
a = TextBlob(a)
b = TextBlob(b)
idx, len(tokens.words), len(sep), len(a.words), len(b.words), sep
if (0 <= wordcounts <= 4):
len(tokens.words), sentence, sep
a, b = [str(e) for e in sep]
a = TextBlob(a)
b = TextBlob(b)
len(a.words), len(b.words), sep, sentence
process_movie_1(folder, gradient, fontface,
fontface_italic, highlight,
highlight_color, font_color, key_color,
first_key, second_key, third_key, string,
stroke_color, stroke_width, txt_under_color,
serial)
elif (5 <= wordcounts <= 6):
len(tokens.words), sentence, sep
a, b = [str(e) for e in sep]
a = TextBlob(a)
b = TextBlob(b)
len(a.words), len(b.words), sep, sentence
process_movie_2(folder, gradient, fontface, fontface_italic, highlight,
highlight_color, font_color, key_color,
first_key, second_key, third_key, string,
stroke_color, stroke_width, txt_under_color,
serial)
elif (7 <= wordcounts <= 15):
len(tokens.words), sentence, sep
a, b = [str(e) for e in sep]
a = TextBlob(a)
b = TextBlob(b)
len(a.words), len(b.words), sep, sentence
if (1 <= len(a.words) <= 3):
print idx, "(clip29 -- done)", len(tokens.words), sep
process_movie_3(folder, gradient, fontface, fontface_italic, highlight,
highlight_color, font_color, key_color,
first_key, second_key, third_key, string,
stroke_color, stroke_width, txt_under_color,
serial)