我目前正在尝试使用Google CSE对项目进行一些抓取。这几乎是我第一次爬网。我几个季度前在学校上了Python课,而抓取原定是我们的最后一个主题,但我们从来没有真正去过。反正...
这就是我想要做的:
使用Google CSE为“观鸟”和“喂鸟”提取Google新闻结果。从查询的结果中,我要提取文章标题,文章链接及其发布日期。然后,我想将所有内容都写到csv中。
这是到目前为止我要努力的事情(在https://gist.github.com/nikhilkumarsingh/5bce182ed57ae73f6cbde52fe846991b的大力帮助下,如果其他人正在寻找CSE简介,那就太好了!):
使用for循环返回标题和链接,以获取查询结果。现在,我只是打印出来以确保得到结果。稍后再写给csv。我的查询结果对象是一个名为“结果”的字典,它看起来像这样(我为要发布的大量代码表示歉意,但我的问题与嵌套有关,所以我认为这是最清晰的解释方法):< / p>
{'kind': 'customsearch#search', 'url': {'type': 'application/json',
'template': 'https://www.googleapis.com/customsearch/v1?q=
{searchTerms}&num={count?}&start={startIndex?}&lr={language?}&safe=
{safe?}&cx={cx?}&sort={sort?}&filter={filter?}&gl={gl?}&cr=
{cr?}&googlehost={googleHost?}&c2coff={disableCnTwTranslation?}&hq=
{hq?}&hl={hl?}&siteSearch={siteSearch?}&siteSearchFilter=
{siteSearchFilter?}&exactTerms={exactTerms?}&excludeTerms=
{excludeTerms?}&linkSite={linkSite?}&orTerms={orTerms?}&relatedSite=
{relatedSite?}&dateRestrict={dateRestrict?}&lowRange=
{lowRange?}&highRange={highRange?}&searchType={searchType}&fileType=
{fileType?}&rights={rights?}&imgSize={imgSize?}&imgType=
{imgType?}&imgColorType={imgColorType?}&imgDominantColor=
{imgDominantColor?}&alt=json'}, 'queries': {'request': [{'title': 'Google
Custom Search - bird watching', 'totalResults': '104000', 'searchTerms':
'bird watching', 'count': 10, 'startIndex': 1, 'inputEncoding': 'utf8',
'outputEncoding': 'utf8', 'safe': 'off', 'cx':
'017465438656188383295:ul7lxhkonwq'}], 'nextPage': [{'title': 'Google
Custom Search - bird watching', 'totalResults': '104000', 'searchTerms':
'bird watching', 'count': 10, 'startIndex': 11, 'inputEncoding': 'utf8',
'outputEncoding': 'utf8', 'safe': 'off', 'cx':
'017465438656188383295:ul7lxhkonwq'}]}, 'context': {'title': 'google
news'}, 'searchInformation': {'searchTime': 0.491713,
'formattedSearchTime': '0.49', 'totalResults': '104000', 'formattedTotalResults': '104,000'}, 'items': [{'kind':
'customsearch#result', 'title': 'Amy Cooper: White woman who called police
on a black man in ...', 'htmlTitle': 'Amy Cooper: White woman who called
police on a black man in ...', 'link':
'https://news.google.com/articles/CAIiEDCQPCzyU2erjQLyLr_nLqUqGQgEKhAIACoH
CAowocv1CjCSptoCMPrTpgU?hl=en-US&gl=US&ceid=US%3Aen', 'displayLink':
'news.google.com', 'snippet': 'May 26, 2020 ... White woman who called
police on a black man bird-watching in Central Park \nhas been fired. By
Amir Vera and Laura Ly, CNN. Updated 4:21\xa0...', 'htmlSnippet': 'May 26,
2020 <b>...</b> White woman who called police on a black man <b>bird</b>-
<b>watching</b> in Central Park <br>\nhas been fired. By Amir Vera and
Laura Ly, CNN. Updated 4:21 ...', 'formattedUrl':
'https://news.google.com/.../CAIiEDCQPCzyU2erjQLyLr_
nLqUqGQgEKhAIACoHCAowocv1CjCSptoCMPrTpgU?...', 'htmlFormattedUrl':
'https://news.google.com/.../CAIiEDCQPCzyU2erjQLyLr_
nLqUqGQgEKhAIACoHCAowocv1CjCSptoCMPrTpgU?...', 'pagemap': {'thumbnail':
[{'src': 'https://cdn.cnn.com/cnnnext/dam/assets/200526102231-02-central-
park-video-dog-video-african-american-trnd-screengrab-super-tease.jpg'}],
'metatags': [{'template-top': 'us,news,art-vid-vls-col,col-top-news',
'og:image': 'https://cdn.cnn.com/cnnnext/dam/assets/200526102231-02-
central-park-video-dog-video-african-american-trnd-screengrab-super-
tease.jpg', 'twitter:card': 'summary_large_image', 'og:image:width':
'1100', 'theme-color': '#000000', 'og:site_name': 'CNN', 'section': 'us',
'vr:canonical': 'https://www.cnn.com/2020/05/26/us/central-park-video-dog-
video-african-american-trnd/index.html', 'article:content-tier': 'free',
'og:description': 'The white woman who called police on a black man in
Central Park during an encounter involving her unleashed dog has been
fired from her job, her employer said Tuesday.', 'twitter:image':
'https://cdn.cnn.com/cnnnext/dam/assets/200526102231-02-central-park-
video-dog-video-african-american-trnd-screengrab-super-tease.jpg', 'og:pubdate': '2020-05-26T06:19:40Z', 'lastmod': '2020-05-26T20:21:18Z', 'pubdate': '2020-05-26T06:19:40Z', 'twitter:title': 'White woman who called police on a black man bird-watching in Central Park has been fired', 'og:type': 'article', 'thumbnail': 'https://cdn.cnn.com/cnnnext/dam/assets/200526102231-02-central-park-video-dog-video-african-american-trnd-screengrab-super-tease.jpg',
'author': 'Amir Vera and Laura Ly, CNN', 'og:title': 'White woman who
called police on a black man bird-watching in Central Park has been
fired', 'og:image:height': '619', 'fb:pages': '5550296508,18793419640',
'referrer': 'unsafe-url', 'fb:app_id': '80401312489', 'viewport':
'width=device-width, initial-scale=1.0, minimum-scale=1.0',
'twitter:description': 'The white woman who called police on a black man
in Central Park during an encounter involving her unleashed dog has been
fired from her job, her employer said Tuesday.', 'og:url':
'https://www.cnn.com/2020/05/26/us/central-park-video-dog-video-african-
american-trnd/index.html', 'article:opinion': 'false'}], 'cse_image':
[{'src': 'https://cdn.cnn.com/cnnnext/dam/assets/200526102231-02-central-
park-video-dog-video-african-american-trnd-screengrab-super-tease.jpg',
'width': '299', 'type': '1', 'height': '168'}], 'newsarticle': [{'image':
'https://cdn.cnn.com/cnnnext/dam/assets/200526102231-02-central-park-
video-dog-video-african-american-trnd-screengrab-super-tease.jpg',
'keywords': 'us, Amy Cooper: White woman who called police on a black man
in Central Park has been fired - CNN', 'author': 'Amir Vera and Laura Ly,
CNN', 'ispartof': 'news', 'description': 'The white woman who called
police on a black man in Central Park during an encounter involving her
unleashed dog has been fired from her job, her employer said Tuesday.',
'datecreated': '2020-05-26T06:19:40Z', 'url':
'https://www.cnn.com/2020/05/26/us/central-park-video-dog-video-african-
american-trnd/index.html', 'articlebody': '(CNN)The white woman who called
police on a black man in Central Park during an encounter involving her
unleashed dog has been fired from her job, her employer said
Tuesday."Following our internal...', 'datemodified': '2020-05-
26T20:21:18Z', 'articlesection': 'us', 'alternativeheadline': 'White woman who called police on a black man bird-watching in Central Park has been
fired', 'headline': 'Amy Cooper: White woman who called police on a black
man in Central Park has been fired - CNN', 'datepublished': '2020-05-
26T06:19:40Z', 'thumbnailurl':
'https://cdn.cnn.com/cnnnext/dam/assets/200526102231-02-central-park-
video-dog-video-african-american-trnd-screengrab-super-tease.jpg'}]}}
我提取链接和标题的代码如下:
for item in result['items']:
print(item['title'], item['link'])
这就是我要坚持的内容:
文章发表日期的键,“ pubdate”嵌套在许多词典和列表中。我很难将其循环拉出。嵌套(无论是循环形式还是数据结构形式)可能是我在编码方面的最大弱点。
包含我感兴趣的所有信息的键是“项”,其值是字典列表:
'items': [{'kind': 'customsearch#result', 'title': 'Amy Cooper: White
woman who called police on a black man in ...', 'htmlTitle': 'Amy Cooper:
White woman who called police on a black man in ...', 'link':
'https://news.google.com/articles/CAIiEDCQPCzyU2erjQLyLr_nLqUqGQgEKhAIACoH
CAowocv1CjCSptoCMPrTpgU?hl=en-US&gl=US&ceid=US%3Aen', 'displayLink':
'news.google.com', 'snippet': 'May 26, 2020 ... White woman who called
police on a black man bird-watching in Central Park \nhas been fired. By
Amir Vera and Laura Ly, CNN. Updated 4:21\xa0...', 'htmlSnippet': 'May 26,
2020 <b>...</b> White woman who called police on a black man <b>bird</b>-
<b>watching</b> in Central Park <br>\nhas been fired. By Amir Vera and
Laura Ly, CNN. Updated 4:21 ...', 'formattedUrl':
'https://news.google.com/.../CAIiEDCQPCzyU2erjQLyLr_
nLqUqGQgEKhAIACoHCAowocv1CjCSptoCMPrTpgU?...', 'htmlFormattedUrl':
'https://news.google.com/.../CAIiEDCQPCzyU2erjQLyLr_
nLqUqGQgEKhAIACoHCAowocv1CjCSptoCMPrTpgU?...', 'pagemap': {'thumbnail':
[{'src': 'https://cdn.cnn.com/cnnnext/dam/assets/200526102231-02-central-
park-video-dog-video-african-american-trnd-screengrab-super-tease.jpg'}],
'metatags': [{'template-top': 'us,news,art-vid-vls-col,col-top-news',
'og:image': 'https://cdn.cnn.com/cnnnext/dam/assets/200526102231-02-
central-park-video-dog-video-african-american-trnd-screengrab-super-
tease.jpg', 'twitter:card': 'summary_large_image', 'og:image:width':
'1100', 'theme-color': '#000000', 'og:site_name': 'CNN', 'section': 'us',
'vr:canonical': 'https://www.cnn.com/2020/05/26/us/central-park-video-dog-
video-african-american-trnd/index.html', 'article:content-tier': 'free',
'og:description': 'The white woman who called police on a black man in
Central Park during an encounter involving her unleashed dog has been
fired from her job, her employer said Tuesday.', 'twitter:image':
'https://cdn.cnn.com/cnnnext/dam/assets/200526102231-02-central-park-
video-dog-video-african-american-trnd-screengrab-super-tease.jpg',
'og:pubdate': '2020-05-26T06:19:40Z', 'lastmod': '2020-05-26T20:21:18Z',
'pubdate': '2020-05-26T06:19:40Z', 'twitter:title': 'White woman who
called police on a black man bird-watching in Central Park has been
fired', 'og:type': 'article', 'thumbnail':
'https://cdn.cnn.com/cnnnext/dam/assets/200526102231-02-central-park-
video-dog-video-african-american-trnd-screengrab-super-tease.jpg',
'author': 'Amir Vera and Laura Ly, CNN', 'og:title': 'White woman who
called police on a black man bird-watching in Central Park has been
fired', 'og:image:height': '619', 'fb:pages': '5550296508,18793419640',
'referrer': 'unsafe-url', 'fb:app_id': '80401312489', 'viewport':
'width=device-width, initial-scale=1.0, minimum-scale=1.0',
'twitter:description': 'The white woman who called police on a black man
in Central Park during an encounter involving her unleashed dog has been
fired from her job, her employer said Tuesday.', 'og:url':
'https://www.cnn.com/2020/05/26/us/central-park-video-dog-video-african-
american-trnd/index.html', 'article:opinion': 'false'}]
在列表aka = result ['items'] [0]中的第一个词典中,我们有键'pagemap',其值是另一个词典,在其中我们有键'metatags',其值是词典列表。此列表的第一个索引包含一个字典,该字典包含我正在寻找其值“ pubdate”的键(我在代码块中放置了几个空格,以便您可以轻松地找到该值):
'metatags': [{'template-top': 'us,news,art-vid-vls-col,col-top-news',
'og:image': 'https://cdn.cnn.com/cnnnext/dam/assets/200526102231-02-
central-park-video-dog-video-african-american-trnd-screengrab-super-
tease.jpg', 'twitter:card': 'summary_large_image', 'og:image:width':
'1100', 'theme-color': '#000000', 'og:site_name': 'CNN', 'section': 'us',
'vr:canonical': 'https://www.cnn.com/2020/05/26/us/central-park-video-
dog-video-african-american-trnd/index.html', 'article:content-tier':
'free', 'og:description': 'The white woman who called police on a black
man in Central Park during an encounter involving her unleashed dog has
been fired from her job, her employer said Tuesday.', 'twitter:image':
'https://cdn.cnn.com/cnnnext/dam/assets/200526102231-02-central-park-
video-dog-video-african-american-trnd-screengrab-super-tease.jpg',
'og:pubdate': '2020-05-26T06:19:40Z', 'lastmod': '2020-05-26T20:21:18Z',
'pubdate': '2020-05-26T06:19:40Z', 'twitter:title': 'White woman who
called police on a black man bird-watching in Central Park has been
fired', 'og:type': 'article', 'thumbnail':
'https://cdn.cnn.com/cnnnext/dam/assets/200526102231-02-central-park-
video-dog-video-african-american-trnd-screengrab-super-tease.jpg',
'author': 'Amir Vera and Laura Ly, CNN', 'og:title': 'White woman who
called police on a black man bird-watching in Central Park has been
fired', 'og:image:height': '619', 'fb:pages': '5550296508,18793419640',
'referrer': 'unsafe-url', 'fb:app_id': '80401312489', 'viewport':
'width=device-width, initial-scale=1.0, minimum-scale=1.0',
'twitter:description': 'The white woman who called police on a black man
in Central Park during an encounter involving her unleashed dog has been
fired from her job, her employer said Tuesday.', 'og:url':
'https://www.cnn.com/2020/05/26/us/central-park-video-dog-video-african-
american-trnd/index.html', 'article:opinion': 'false'}]
希望您能够通过这个相当粗糙的巢结构跟着我...
所以理想情况下,我正在寻找的是一个可以让我恢复活力的循环:
Amy Cooper: White woman who called police on a black man in ... https://news.google.com/articles/CAIiEDCQPCzyU2erjQLyLr_nLqUqGQgEKhAIACoHCAowocv1CjCSptoCMPrTpgU?hl=en-US&gl=US&ceid=US%3Aen
2020-05-26T06:19:40Z
等等,查询结果中的下一个故事。
我最近得到的是:
for item in result['items']:
print(item['title'], item['link'])
for date in result['items'][0]['pagemap']['metatags']:
print (date['pubdate'])
这很接近,但只返回第一个故事的日期,即使循环继续到下一个故事:
Amy Cooper: White woman who called police on a black man in ... https://news.google.com/articles/CAIiEDCQPCzyU2erjQLyLr_nLqUqGQgEKhAIACoHCAowocv1CjCSptoCMPrTpgU?hl=en-US&gl=US&ceid=US%3Aen
2020-05-26T06:19:40Z
Christian Cooper shouldn't need a Harvard degree to survive birding ... https://news.google.com/articles/CAIiEOCKmxd9S5s5cwM5xs0AivoqGAgEKg8IACoHCAowjtSUCjC30XQwzqe5AQ?hl=en-US&gl=US&ceid=US%3Aen
2020-05-26T06:19:40Z
People called police on this black birdwatcher so many times that he ... https://news.google.com/articles/CAIiEOkNNX95htD_KKDYihI5JcoqGAgEKg8IACoHCAowjtSUCjC30XQwzqe5AQ?hl=en-US&gl=US&ceid=US%3Aen
2020-05-26T06:19:40Z
A black man bird-watching in Central Park asked a white woman to ... https://news.google.com/articles/CAIiENZfU5G5gfmzo2CysHOaY0sqFQgEKg0IACoGCAowuLUIMNFnMLnhAg?hl=en-US&gl=US&ceid=US%3Aen
2020-05-26T06:19:40Z
What's a Tough Call in Bird Watching? Identifying a Gull - WSJ https://news.google.com/articles/CAIiEMKd4gQ1olRNd5T2Ndlpiu8qGAgEKg8IACoHCAow1tzJATDnyxUwuK20AQ
2020-05-26T06:19:40Z
Any advice, tips, help, or words of nested for loop wisdom would be greatly appreciated!!!!
答案 0 :(得分:1)
您每次都访问result['items'][0]
中数组的第一个单元格。工作代码:
for item in result['items']:
print(item['title'], item['link'])
for date in item['pagemap']['metatags']:
print(date.get('pubdate', 'Pubdate is not specified'))