python遍历术语列表并返回顶部url以创建字典

时间:2019-09-06 02:55:12

标签: python-3.x

修改先前的帖子,以使问题/问题更加简洁。有一个关键字列表。对于每个关键字,要返回搜索引擎结果的前x个网址。因为每个术语都必须是唯一的,所以我尝试应用附加到返回键值末尾的排序逻辑,以使每个键都唯一。

我尝试过:

from googlesearch import search

queries = ["richest countries in the world", "poorest countries in the world"]
count = 0
item = dict()

for query in queries:
    for site in search(query, tld = "co.in", num=10, stop=10, pause=3):
        count = count + 1
        item.update([{query + " " + str(count), site}])
        print(item)

此返回:

{'https://www.visualcapitalist.com/chart-the-10-wealthiest-countries-in-the-world/': 'richest countries in the world 1',
 'https://finance.yahoo.com/news/50-richest-countries-world-090000142.html': 'richest countries in the world 2',
 'richest countries in the world 3': 'http://worldpopulationreview.com',
...,
 'https://www.countries-ofthe-world.com/richest-countries.html': 'richest countries in the world 10',
 'poorest countries in the world 11': 'https://www.focus-economics.com/blog/the-poorest-countries-in-the-world',
 'https://www.usatoday.com/story/money/2019/07/07/afghanistan-madagascar-malawi-poorest-countries-in-the-world/39636131/': 'poorest countries in the world 12',
 'http://worldpopulationreview.com/countries/poorest-countries-in-the-world/': 'poorest countries in the world 13',
...,
 'poorest countries in the world 20': 'https://www.concernusa.org/story/worlds-poorest-countries/'}

,它很接近,但是您可以看到其中一些键是URL的。 item.keys()返回URL和搜索项的混合,确认并非所有键都是应有的搜索项。所需的最终状态是字典,其中键=搜索项,值=网址:

{'richest countries in the world 1': 'https://www.visualcapitalist.com/chart-the-10-wealthiest-countries-in-the-world/', 
'richest countries in the world 2': 'https://finance.yahoo.com/news/50-richest-countries-world-090000142.html', 
... , 
'poorest countries in the world 20': 'https://www.concernusa.org/story/worlds-poorest-countries/'}

0 个答案:

没有答案