什么是好的关键字提取Web服务?

时间:2009-04-02 18:41:37

标签: web-services

我正在寻找可以从一段文字中提取重要关键字的网络服务。

我已经尝试过Yahoo Term Extraction服务。这项服务的问题在于它没有为短文本提供任何结果。

或者,我可以使用任何可以使用的代码,它可以从一段文本中提取重要的关键字。即删除字符串中的所有通用词。

例如:

“我想买一台数码相机”

术语:“数字”,“相机”

感谢。

还有另外两个相关的堆栈溢出问题,并且有更多信息:

What is a simple way to generate keywords from a text? Filter out common words for search query

2 个答案:

答案 0 :(得分:4)

您可能希望查看www.opencalais.com(与路透社相关)是一个提供的网络服务

您的文字“我想购买数码相机”会返回此rdf / xml文档 -

    <!--Use of the Calais Web Service is governed by the Terms of Service located at http://www.opencalais.com. By using this service or the results of the service you agree to these terms of service.-->
    <!--Relations: GenericRelations

    Technology: digital camera-->
    <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:c="http://s.opencalais.com/1/pred/">
        <rdf:Description c:allowDistribution="true" c:allowSearch="true" c:calaisRequestID="1ef6064f-283c-4fd4-a922-0ff493c4353a" c:externalID="calaisbridge" c:id="http://id.opencalais.com/SLlKCS2i2mZA3ABrQS0F9Q" rdf:about="http://d.opencalais.com/dochash-1/97cdaf47-fa15-31a1-be2b-3be1184d412a">
            <rdf:type rdf:resource="http://s.opencalais.com/1/type/sys/DocInfo" />
            <c:document>
    <![CDATA[<Document>
    <Date>2009-04-03</Date>
    <Body>I want to buy a digital camera</Body>
    </Document>]]>
    </c:document>
            <c:docTitle />
            <c:docDate>2009-04-03 00:00:00</c:docDate>
            <c:externalMetadata c:caller="calaisbridge" />
            <c:submitter>calaisbridge</c:submitter>
        </rdf:Description>
        <rdf:Description c:contentType="text/txt" c:emVer="UserVocabulariesIM" c:langIdVer="DefaultLangId" c:language="InputTextTooShort" c:processingVer="CalaisJob01" c:submissionDate="2009-04-03 14:14:42.532" rdf:about="http://d.opencalais.com/dochash-1/97cdaf47-fa15-31a1-be2b-3be1184d412a/meta">
            <rdf:type rdf:resource="http://s.opencalais.com/1/type/sys/DocInfoMeta" />
            <c:docId rdf:resource="http://d.opencalais.com/dochash-1/97cdaf47-fa15-31a1-be2b-3be1184d412a" />
            <c:submitterCode>416dcd8a-766f-0aa3-d94c-e5034b6ffc98</c:submitterCode>
            <c:signature>digestalg-1|sUmdk2pKaXLrsD0b2sNfX5dPvW4=|e+F5sMjqxqj0Qi+efzdG5D2s1TKBM//zH+NI1MNYvugY3FS9e3xP6g==</c:signature>
        </rdf:Description>
        <rdf:Description rdf:about="http://d.opencalais.com/dochash-1/97cdaf47-fa15-31a1-be2b-3be1184d412a/lid/DefaultLangId">
            <rdf:type rdf:resource="http://s.opencalais.com/1/type/lid/DefaultLangId" />
            <c:docId rdf:resource="http://d.opencalais.com/dochash-1/97cdaf47-fa15-31a1-be2b-3be1184d412a" />
            <c:lang rdf:resource="http://d.opencalais.com/lid/DefaultLangId/InputTextTooShort" />
        </rdf:Description>
        <rdf:Description rdf:about="http://d.opencalais.com/genericHasher-1/e224e552-7ebd-3ed1-aaa4-f8aba30331c2">
            <rdf:type rdf:resource="http://s.opencalais.com/1/type/em/e/Technology" />
            <c:name>digital camera</c:name>
        </rdf:Description>
        <rdf:Description rdf:about="http://d.opencalais.com/dochash-1/97cdaf47-fa15-31a1-be2b-3be1184d412a/Instance/1">
            <rdf:type rdf:resource="http://s.opencalais.com/1/type/sys/InstanceInfo" />
            <c:docId rdf:resource="http://d.opencalais.com/dochash-1/97cdaf47-fa15-31a1-be2b-3be1184d412a" />
            <c:subject rdf:resource="http://d.opencalais.com/genericHasher-1/e224e552-7ebd-3ed1-aaa4-f8aba30331c2" />
            <!--Technology: digital camera; -->
            <c:detection>[ment&gt;&lt;Date&gt;2009-04-03&lt;/Date&gt;&lt;Body&gt;I want to buy a ]digital camera[&lt;/Body&gt;&lt;/Document&gt;]</c:detection>
            <c:prefix>ment&gt;&lt;Date&gt;2009-04-03&lt;/Date&gt;&lt;Body&gt;I want to buy a </c:prefix>
            <c:exact>digital camera</c:exact>
            <c:suffix>&lt;/Body&gt;&lt;/Document&gt;</c:suffix>
            <c:offset>55</c:offset>
            <c:length>14</c:length>
        </rdf:Description>
        <rdf:Description rdf:about="http://d.opencalais.com/dochash-1/97cdaf47-fa15-31a1-be2b-3be1184d412a/Relevance/1">
            <rdf:type rdf:resource="http://s.opencalais.com/1/type/sys/RelevanceInfo" />
            <c:docId rdf:resource="http://d.opencalais.com/dochash-1/97cdaf47-fa15-31a1-be2b-3be1184d412a" />
            <c:subject rdf:resource="http://d.opencalais.com/genericHasher-1/e224e552-7ebd-3ed1-aaa4-f8aba30331c2" />
            <c:relevance>0.857</c:relevance>
        </rdf:Description>
        <rdf:Description rdf:about="http://d.opencalais.com/genericHasher-1/e8eac39c-f280-331e-9ccd-07f740d46ddb">
            <rdf:type rdf:resource="http://s.opencalais.com/1/type/em/r/GenericRelations" />
            <c:verb>buy</c:verb>
            <c:relationsubject>I</c:relationsubject>
            <!--digital camera-->
            <c:relationobject rdf:resource="http://d.opencalais.com/genericHasher-1/e224e552-7ebd-3ed1-aaa4-f8aba30331c2" />
        </rdf:Description>
        <rdf:Description rdf:about="http://d.opencalais.com/dochash-1/97cdaf47-fa15-31a1-be2b-3be1184d412a/Instance/2">
            <rdf:type rdf:resource="http://s.opencalais.com/1/type/sys/InstanceInfo" />
            <c:docId rdf:resource="http://d.opencalais.com/dochash-1/97cdaf47-fa15-31a1-be2b-3be1184d412a" />
            <c:subject rdf:resource="http://d.opencalais.com/genericHasher-1/e8eac39c-f280-331e-9ccd-07f740d46ddb" />
            <!--GenericRelations: verb: buy; relationsubject: I; relationobject: digital camera; -->
            <c:detection>[&lt;Document&gt;&lt;Date&gt;2009-04-03&lt;/Date&gt;&lt;Body&gt;]I want to buy a digital camera[&lt;/Body&gt;&lt;/Document&gt;]</c:detection>
            <c:prefix>&lt;Document&gt;&lt;Date&gt;2009-04-03&lt;/Date&gt;&lt;Body&gt;</c:prefix>
            <c:exact>I want to buy a digital camera</c:exact>
            <c:suffix>&lt;/Body&gt;&lt;/Document&gt;</c:suffix>
            <c:offset>39</c:offset>
            <c:length>30</c:length>
        </rdf:Description>
    </rdf:RDF>

答案 1 :(得分:1)

我知道有些人在使用WordsFinder服务方面取得了一些成功。