我有一个应用程序接收来自某些新闻门户网站的新闻,我想查找此新闻的关键字并将其保存在特殊的表格中,但我不知道如何找到这些关键字!
因为代码每5分钟运行一次,所以它使用了大量的服务器源代码,我希望没有繁重的代码!
我自己有一个古老的想法,即用文字分割文字并计算它们并得到前5个单词但几乎总是关键字应该是“a”或“the”等等
有什么建议吗?
答案 0 :(得分:1)
答案 1 :(得分:1)
您可以从Gutenberg项目(http://www.gutenberg.org/files/29765/29765-8.txt)下载英语词典,例如Webster的Unabridged Dictionary,并将其解析为代词和介词,并将结果用作计数中要忽略的单词列表。
基于以上内容的快速而肮脏的解析实验提供了以下列表:
AMONGST A ABOON AGAINST AMID
AT ATAFTER BATING BEHITHER BESIDE
BESIDES BETWIXT DURANTE DURING EMFORTH
FOREBY FORENENST FROM HE HERS
HERSELF HIMSELF HIMSELVE HIR HIS
HO I ICH IDEM IK
INTO INWITH IT ITSELF MALGRE
MYSELF MYSELVEN O' OF ONESELF
ONTO OURSELVES OUTCEPT OUTTAKE PER
REGARDING RESPECTING SENZA SHE SITH
THAT THEM THEMSELVES THESE THILK
THOSE THRU THURGH THY THYSELF
UMBE UNNEAR UPON UPTILL US
VERSUS WE WHATE'ER WHATEVER WHATSOEVER
WHICH WHO WHOEVER WHOM WHOMSOEVER
WHOSE WHOSESOEVER WHOSO WHOSOEVER WITHOUTEN
YER YMEL YOU YOURS YOURSELF
YOW
如上所述,需要改进......