Question

我目前正在研究从Google学术搜索中抓取数据的项目。我希望刮掉每个档案的居住国，但这并没有明确列出。例如，从this page我想要英国，因为列出的电子邮件地址来自ucl.ac.uk.再举一个例子，从this page我想给荷兰，因为电子邮件地址来自vumc.nl.但是，如果我们从网址TLD查看this profile，我们就无法确定该国家/地区。

到目前为止，我已经编写了这段代码来捕获域名：

import csv
from bs4 import BeautifulSoup
import urllib.request
import string
import time


url = 'https://scholar.google.com/citations?user=VGoSakQAAAAJ'

page = urllib.request.urlopen(url)
soup = BeautifulSoup(page, 'lxml')
buttons = soup.findAll("div", { "id" : "gsc_prf_ivh" })
for each in buttons:
    s = each.text

那么，我如何从用户的Google学术搜索资料中以相当高的准确度确定他们的国家？

从Google学术搜索资料中确定国家/地区

0 个答案: