如何从Google获取公司信息?

时间:2017-01-24 04:06:18

标签: google-api

如果我们在谷歌搜索公司的网站我们得到以下结果。我需要获得公司信息并成立一年。没有员工我怎么能得到。有没有API可用?你能帮我这个吗?感谢

enter image description here

2 个答案:

答案 0 :(得分:2)

您正在寻找Google knowledge graph API。右侧框中的信息将从Google知识图表中提取,以获得最佳结果。

您可以获得Organization实体所需的信息:

  

学校,非政府组织,公司,俱乐部等组织

Organization的示例属性包括legalNamelogofoundingDate

例如,这是我用于Facebook的简单查询:

https://kgsearch.googleapis.com/v1/entities:search?query=Facebook&key=<YOUR_API_KEY_HERE>&indent=True

这是我得到的结果:

{
      "@type": "EntitySearchResult",
      "result": {
        "@id": "kg:/m/0hmyfsv",
        "name": "Facebook, Inc.",
        "@type": [
          "Corporation",
          "Organization",
          "Thing"
        ],
        "description": "Social network company",
        "image": {
          "contentUrl": "http://t3.gstatic.com/images?q=tbn:ANd9GcTjO7_7_DBuI3EpMBdVTACYT2WDkwKGrBic0JYSGtIt1c_0oMK9",
          "url": "https://commons.wikimedia.org/wiki/File:F_icon.svg"
        },
        "url": "https://www.facebook.com/"
      },
      "resultScore": 32.638672
    }
BTW,出于某种原因,Facebook是Youtube之后的第二个结果列表

<强>更新

目前看来,API没有提供控制结果中返回哪些属性的方法,并且默认情况下并非所有属性都包含在响应中。关于如何完成任务,有一个问题here

API reference开始,接受的请求参数为:

  • query(例如query=Facebook
  • ids(例如ids=/m/0hmyfsv
  • languages(例如languages=en
  • types(例如types=Corporation
  • indent(例如indent=true
  • limit(例如limit=2

响应参数为:@idname@typedescriptionimage detailedDescription(如果有),以及{ {1}}

您要查找的信息实际上是在resultScore属性中提供的网址中包含的维基百科页面中提供的,因此您可能需要考虑使用 Wikidata API 相反

答案 1 :(得分:0)

wikidata sparql endpoint

中运行以下查询
SELECT DISTINCT 
?wdindustryLabel 
?wdcompanyName 
?wdcountryLabel 
(SAMPLE(?wdemployee) AS ?wdemployee) 
(SAMPLE(?wdfounded) AS ?wdfounded) 
(SAMPLE(?wdofficial_website) AS ?wdofficial_website) 
WHERE {
  ?wdcompany wdt:P31 ?type;
    rdfs:label ?wdcompanyName.
  OPTIONAL {
    ?wdcompany wdt:P452 ?wdindustry;
      wdt:P1128 ?wdemployee;
      wdt:P571 ?wdfounded;
      wdt:P17 ?wdcountry;
      wdt:P856 ?wdofficial_website.
  }
  FILTER(LANGMATCHES(LANG(?wdcompanyName), "EN"))
  VALUES ?type {
    wd:Q6881511
    wd:Q43229
  }
  VALUES ?wdcompanyName {
    "Apple Inc."@en
  }
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
GROUP BY ?wdcompanyName ?wdindustryLabel ?wdcountryLabel
ORDER BY (?wdcompanyName)

enter image description here

或使用以下代码:

# pip install sparqlwrapper
# https://rdflib.github.io/sparqlwrapper/

import sys
from SPARQLWrapper import SPARQLWrapper, JSON

endpoint_url = "https://query.wikidata.org/sparql"

query = """SELECT DISTINCT ?wdindustryLabel ?wdcompanyName ?wdcountryLabel (SAMPLE(?wdemployee) AS ?wdemployee) (SAMPLE(?wdfounded) AS ?wdfounded) (SAMPLE(?wdofficial_website) AS ?wdofficial_website) WHERE {
  ?wdcompany wdt:P31 ?type;
    rdfs:label ?wdcompanyName.
  OPTIONAL {
    ?wdcompany wdt:P452 ?wdindustry;
      wdt:P1128 ?wdemployee;
      wdt:P571 ?wdfounded;
      wdt:P17 ?wdcountry;
      wdt:P856 ?wdofficial_website.
  }
  FILTER(LANGMATCHES(LANG(?wdcompanyName), "EN"))
  VALUES ?type {
    wd:Q6881511
    wd:Q43229
  }
  VALUES ?wdcompanyName {
    'Apple Inc.'@en
  }
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
GROUP BY ?wdcompanyName ?wdindustryLabel ?wdcountryLabel
Order By ?wdcompanyName"""


def get_results(endpoint_url, query):
    user_agent = "WDQS-example Python/%s.%s" % (sys.version_info[0], sys.version_info[1])
    # TODO adjust user agent; see https://w.wiki/CX6
    sparql = SPARQLWrapper(endpoint_url, agent=user_agent)
    sparql.setQuery(query)
    sparql.setReturnFormat(JSON)
    return sparql.query().convert()


results = get_results(endpoint_url, query)

for result in results["results"]["bindings"]:
    print(result)