Question

编辑我现在意识到API简直不足，甚至无法正常工作。我想重定向我的问题，我希望能够使用他们的“我感觉很难”来自动神奇地搜索duckduckgo。因此我可以搜索“stackoverflow”，并获取主页（“https://stackoverflow.com/”）作为我的结果。

我正在使用duckduckgo API。 Here

我在使用时发现：

r = duckduckgo.query("example")

结果不反映手动搜索，即：

for result in r.results:
    print result

结果：

>>> 
>>>

没有

在results中查找索引会导致越界错误，因为它是空的。

我应该如何获得搜索结果？

似乎API（根据其记录的示例）应该以{{1}}

的形式回答问题并给出一种“我感觉很难吃”

但该网站的制作方式使我无法搜索并使用常规方法解析结果。

我想知道我应该如何使用此API或此网站的任何其他方法解析搜索结果。

谢谢。

Answer 1

如果您访问DuckDuck Go API Page，您会发现有关使用API的一些注意事项。第一个笔记清楚地表明：

由于这是零点击Info API，因此大多数深层查询（非主题名称）将是空白的。

这是这些字段的列表：

Abstract: ""
AbstractText: ""
AbstractSource: ""
AbstractURL: ""
Image: ""
Heading: ""
Answer: ""
Redirect: ""
AnswerType: ""
Definition: ""
DefinitionSource: ""
DefinitionURL: ""
RelatedTopics: [ ]
Results: [ ]
Type: ""

所以这可能是一个遗憾，但他们的API只是截断了一堆结果，并没有给你;可能工作得更快，似乎除了使用DuckDuckGo.com之外什么也做不了。

所以，显然，在那种情况下，API不是可行的方法。

至于我，我只看到一条出路：从duckduckgo.com检索原始html并使用例如解析它来解析它。 html5lib（值得一提的是，他们的html结构良好）。

值得一提的是，解析html页面并不是最可靠的废弃数据的方法，因为html结构可以改变，而API通常会保持稳定，直到公开宣布更改。

以下是使用BeautifulSoup实现此类解析的示例和示例：

from BeautifulSoup import BeautifulSoup
import urllib
import re

site = urllib.urlopen('http://duckduckgo.com/?q=example')
data = site.read()

parsed = BeautifulSoup(data)
topics = parsed.findAll('div', {'id': 'zero_click_topics'})[0]
results = topics.findAll('div', {'class': re.compile('results_*')})

print results[0].text

此脚本打印：

u'Eixample, an inner suburb of Barcelona with distinctive architecture'

在主页面上直接查询的问题是它使用JavaScript来生成所需的结果（不是相关主题），因此您可以使用HTML版本来获取结果。 HTML版本有不同的链接：

http://duckduckgo.com/?q=example #JavaScript version
http://duckduckgo.com/html/?q=example #HTML-only version

让我们看看我们能得到什么：

site = urllib.urlopen('http://duckduckgo.com/html/?q=example')
data = site.read()
parsed = BeautifulSoup(data)

first_link = parsed.findAll('div', {'class': re.compile('links_main*')})[0].a['href']

存储在first_link变量中的结果是指向搜索引擎输出的第一个结果（不是相关搜索）的链接：

http://www.iana.org/domains/example

要获取所有链接，您可以迭代找到的标记（除链接之外的其他数据可以类似方式接收）

for i in parsed.findAll('div', {'class': re.compile('links_main*')}):
    print i.a['href']

http://www.iana.org/domains/example
https://twitter.com/example
https://www.facebook.com/leadingbyexample
http://www.trythisforexample.com/
http://www.myspace.com/leadingbyexample?_escaped_fragment_=
https://www.youtube.com/watch?v=CLXt3yh2g0s
https://en.wikipedia.org/wiki/Example_(musician)
http://www.merriam-webster.com/dictionary/example
...

请注意，仅HTML版本仅包含结果，对于相关搜索，您必须使用JavaScript版本。（vithout html部分在网址中。）

Answer 2

在我已经接受并给予赏金的问题的答案之后 - 我找到了一个不同的解决方案，我想在此处添加完整性。非常感谢所有帮助我实现这一解决方案的人。虽然这不是我要求的解决方案，但它可能会对将来有所帮助。

在本网站和一些支持邮件上进行了长时间的艰苦对话后找到：https://duck.co/topic/strange-problem-when-searching-intel-with-my-script

这是解决方案代码（来自上面发布的帖子中的答案）：

>>> import duckduckgo
>>> print duckduckgo.query('! Example').redirect.url
http://www.iana.org/domains/example

Answer 3

尝试：

for result in r.results:
    print result.text

Answer 4

如果它适合您的应用程序，您也可以尝试相关的搜索

r = duckduckgo.query("example")
for i in r.related_searches:
    if i.text:
        print i.text

这会产生：

Eixample, an inner suburb of Barcelona with distinctive architecture
Example (musician), a British musician
example.com, example.net, example.org, example.edu  and .example, domain names reserved for use in documentation as examples
HMS Example (P165), an Archer-class patrol and training vessel of the British Royal Navy
The Example, a 1634 play by James Shirley
The Example (comics), a 2009 graphic novel by Tom Taylor and Colin Wilson

Answer 5

对于python 3用户，@ Rostyslav Dzinko代码的转录：

import re, urllib
import pandas as pd
from bs4 import BeautifulSoup

query = "your query"
site = urllib.request.urlopen("http://duckduckgo.com/html/?q="+query)
data = site.read()
soup = BeautifulSoup(data, "html.parser")

my_list = soup.find("div", {"id": "links"}).find_all("div", {'class': re.compile('.*web-result*.')})[0:15]


(result__snippet, result_url) = ([] for i in range(2))

for i in my_list:         
      try:
            result__snippet.append(i.find("a", {"class": "result__snippet"}).get_text().strip("\n").strip())
      except:
            result__snippet.append(None)
      try:
            result_url.append(i.find("a", {"class": "result__url"}).get_text().strip("\n").strip())
      except:
            result_url.append(None)

Answer 6

不幸的是，正如许多人所指出的，非 JS DuckDuckGo 版本：http://duckduckgo.com/html/?q=example 没有您可能在实时版本中找到的任何富媒体搜索结果（“相关搜索”、“最近新闻”等）。 ).

如果您想检索 FULL DuckDuckGo 结果，您必须向 https://links.duckduckgo.com/d.js 发出请求。

关于我如何为 here 构建解析器，我给出了更详细的答案 SerpApi。

duckduckgo API不返回结果

6 个答案: