如何自动检测从pdf解析的文本文件中的完整句子

时间:2019-02-22 17:40:19

标签: nlp sentence

我正在从事一个项目,该项目需要我从从pdf解析的文本文件中提取完整的句子。这些原始文本文件确实很混乱,从某种意义上来说,pdf的表和段落都包括在内。

这是文本文件的快照

Issue 15-24 | Thursday 18 June 2015 PRICES Sulphur prices YL 4 Contract 
Spot Saupe fob Vancouver Q2-2015 135-145 135-145 fob Middle East* Q2- 
2015 140-165 145-151 fob Qatar QSP Jun 2015 141 fob UAE OSP Jun 2015 
145 fob Iran 139-145 fob Black Sea (lump-gran) Q2-2015 110-130 120-130 
fob US Gulf Q2-2015 135-150 135-140 cfr Brazil Q2-2015 150-165 155-160 
cfr Med (under 10 k) 128-148 fob Med (under 10 k) 110-120 cfr N Africa 
(lump-gran) Q2-2015 135-155 140-155 cfr India 163-168 cfr China Q2-2015 
143-163 143-163 ex-w Nantong (CNY/t) 1250-1260
“excluding Iran cfr Tampa/C Fla (l.t.) Q2-2015 132 cfr Benelux (loc 
refs) Q2-2015 155-172 cpt NW Europe Q2-2015 193-214
cpt = ‘carriage paid to’ for sulphur delivered by Roadtankcar FM

Argus FMB Sulphur pated after the Chinese New Year in February, prices 
eroded slightly but did not enter a free-fall. Some argue that it was 
down to a structural market tightness, which is expected to provide 
support to current sulphur prices and to potentially prevent prices 
from falling sharply even if Chinese buyers decided to exit the market 
in the next few weeks.

我需要的是一个可以提取所有完整句子,忽略那些表和不完整句子的工具。我想知道现在是否有解决此问题的解决方案。

任何帮助将不胜感激!

0 个答案:

没有答案