我想从freebase中提取所有公司的详细信息。我试图使用mql查询。但它永远不会让我超过4100条记录。我也尝试过使用游标,但是使用游标也可以获得相同数量的记录。
我用谷歌搜索了一些人建议下载转储而不是提取信息。这是唯一的方法吗?如果是,那么如何从转储中获取以下信息。任何帮助都非常感谢。
[
{
"type": "/business/company",
"name": null,
"parent_company": [{}],
"products": [].
"industry": [].
"founded": null,
"net_income": [
{
"amount": null,
"valid_date": null,
"currency": null
}
],
"company_type": [],
"headquarters": [{}],
"number_of_employees": [{}],
"/base/schemastaging/organization_extra/phone_number": [{}]
}
]
答案 0 :(得分:1)
首先,强制性警告。 Freebase已经被读了很多个月,很快就会被关闭。那里的数据陈旧。
我对该查询得到了4189的计数,所以听起来你很接近预期的结果。另一方面,Freebase中有超过40万家企业,所以也许您并不打算将查询限制为只有那些有净收入信息的企业。如果是这种情况,您可以通过将"optional": true
添加到查询的该子句来修改查询。即
"net_income": [{
"amount": null,
"valid_date": null,
"currency": null,
"optional": true
}],
话虽如此,通过API查询400K是非常多的。要从Freebase数据转储中获取相同的信息,只需过滤您查询中包含的相同属性。
请注意,多年来,这种架构已经进行了一些重要的重构,因此查询中的某些内容不是当前首选的属性名称,而是较旧的别名。例如,/ business / company的当前名称是/ business / business_operation,/ business / company / established实际上只是/ organization / organization / date_founded的别名,所以你想要的是什么在转储中寻找。
在转储中,所有斜杠(/)都用点(。)替换,因此您可以使用这样的zgrep命令进行过滤:
$ zgrep "organization\.organization.\parent" freebase-rdf-2015-04-19-00-00.gz
<http://rdf.freebase.com/ns/m.010b0njl> <http://rdf.freebase.com/ns/organization.organization.parent> <http://rdf.freebase.com/ns/m.010d_x4z> .
<http://rdf.freebase.com/ns/m.010qw9c3> <http://rdf.freebase.com/ns/organization.organization.parent> <http://rdf.freebase.com/ns/m.0110pjfc> .
$ zgrep "business\.business_operation\.industry" freebase-rdf-2015-04-19-00-00.gz
<http://rdf.freebase.com/ns/m.010b2kgs> <http://rdf.freebase.com/ns/business.business_operation.industry> <http://rdf.freebase.com/ns/m.0c5mq> .
<http://rdf.freebase.com/ns/m.010h6tq9> <http://rdf.freebase.com/ns/business.business_operation.industry> <http://rdf.freebase.com/ns/m.02y_9m3> .
对于调解员或CVT,每个调解员都会有一条单独的线。因此,例如,名称更改可能如下所示:
<http://rdf.freebase.com/ns/m.0q2g4kt> <http://rdf.freebase.com/ns/business.company_name_change.end_date> "2004"^^<http://www.w3.org/2001/XMLSchema#gYear> .
<http://rdf.freebase.com/ns/m.0q2g4kt> <http://rdf.freebase.com/ns/business.company_name_change.company> <http://rdf.freebase.com/ns/m.06_dbm> .
<http://rdf.freebase.com/ns/m.0q2g4kt> <http://rdf.freebase.com/ns/business.company_name_change.start_date> "1974"^^<http://www.w3.org/2001/XMLSchema#gYear> .
<http://rdf.freebase.com/ns/m.0q2g4kt> <http://rdf.freebase.com/ns/business.company_name_change.new_name> "Cinar"@en .