修改先前的帖子,以使问题/问题更加简洁。有一个关键字列表。对于每个关键字,要返回搜索引擎结果的前x个网址。因为每个术语都必须是唯一的,所以我尝试应用附加到返回键值末尾的排序逻辑,以使每个键都唯一。
我尝试过:
from googlesearch import search
queries = ["richest countries in the world", "poorest countries in the world"]
count = 0
item = dict()
for query in queries:
for site in search(query, tld = "co.in", num=10, stop=10, pause=3):
count = count + 1
item.update([{query + " " + str(count), site}])
print(item)
此返回:
{'https://www.visualcapitalist.com/chart-the-10-wealthiest-countries-in-the-world/': 'richest countries in the world 1',
'https://finance.yahoo.com/news/50-richest-countries-world-090000142.html': 'richest countries in the world 2',
'richest countries in the world 3': 'http://worldpopulationreview.com',
...,
'https://www.countries-ofthe-world.com/richest-countries.html': 'richest countries in the world 10',
'poorest countries in the world 11': 'https://www.focus-economics.com/blog/the-poorest-countries-in-the-world',
'https://www.usatoday.com/story/money/2019/07/07/afghanistan-madagascar-malawi-poorest-countries-in-the-world/39636131/': 'poorest countries in the world 12',
'http://worldpopulationreview.com/countries/poorest-countries-in-the-world/': 'poorest countries in the world 13',
...,
'poorest countries in the world 20': 'https://www.concernusa.org/story/worlds-poorest-countries/'}
,它很接近,但是您可以看到其中一些键是URL的。 item.keys()返回URL和搜索项的混合,确认并非所有键都是应有的搜索项。所需的最终状态是字典,其中键=搜索项,值=网址:
{'richest countries in the world 1': 'https://www.visualcapitalist.com/chart-the-10-wealthiest-countries-in-the-world/',
'richest countries in the world 2': 'https://finance.yahoo.com/news/50-richest-countries-world-090000142.html',
... ,
'poorest countries in the world 20': 'https://www.concernusa.org/story/worlds-poorest-countries/'}