Word文件包含43000条记录,句子文件包含430,000个句子 可能有多个句子可以包含同一个单词。最好的方法是写出句子文件中所有具有/包含单词文件中单词的句子。
我试图实现特里树(二叉树)作为解决方案,但是如果单词在句子之间,特里树不起作用。
f = open("word.txt", "r")
word_list = []
for x in f:
word_list.append(x)
f.close()
sentence_list = []
s = open('sentence.txt', 'r')
for sen in s:
sentence_list.append(sen)
s.close()
wrt = open("test3.txt", "w+")
for a in word_list:
for s in sentence_list:
if a.strip(' ') not in s.strip(' '):
continue
wrt.write(s)
wrt.close()
sentence.txt
A00H00:orphanMF.csv:Yes A00H00 SYSIN
A00H00:orphanMF.csv:Yes A00H00 SYSIN
A00H00:orphanMF.csv:Yes A00H00 SYSIN
A00H00:orphanMF.csv:Yes A00H00 SYSIN
A00H00:orphanMF.csv:Yes A00H00 SYSIN
A00H00:orphanMF.csv:Yes A00H00 SYSIN
A00H00:A00S00:JCLXref.csv:LS1A00S0 JCL
JCLXref.csv:LS1A00S0_1 JCL Q0P1
JCLXref.csv:LS1A00S0_2 JCL B4Q1
word.txt
A00S00
$$$COIBM
$AMBLST2
output.txt
A00H00:A00S00:JCLXref.csv:LS1A00S0 JCL
JCLXref.csv:LS1A00S0_1 JCL Q0P1
JCLXref.csv:LS1A00S0_2 JCL B4Q1
在此先感谢所有提供意见的人员继续前进。