通过应用程序从BigQuery获取专利数据

时间:2018-11-27 02:38:42

标签: google-bigquery

我想像这样通过application_number收集数据。

SELECT p.application_number AS app, COUNT(c.publication_number) AS Citations 
     FROM 'patents-public-data.patents.publications' AS p, UNNEST(citation) AS c 
     WHERE p.application_number IN ('CN201510747352'
         ) 
     GROUP BY p.application_number

但是它不起作用。网址是专利页面。谁能帮我一个忙? patent_application_number

1 个答案:

答案 0 :(得分:1)

以下是用于BigQuery标准SQL

#standardSQL
SELECT 
  p.application_number AS app, 
  SUM((SELECT COUNT(publication_number) FROM UNNEST(citation))) AS Citations 
FROM `patents-public-data.patents.publications` AS p
WHERE p.application_number IN ('CN-201510747352-A') 
GROUP BY p.application_number   

有结果

Row app                 Citations    
1   CN-201510747352-A   14      

请注意:如果您使用CN-201510747352-A而不是CN201510747352,则原始查询将起作用

#standardSQL
SELECT p.application_number AS app, COUNT(c.publication_number) AS Citations 
FROM `patents-public-data.patents.publications` AS p, 
UNNEST(citation) AS c 
WHERE p.application_number IN ('CN-201510747352-A') 
GROUP BY p.application_number    

但是-我建议您使用我提供的查询-原因是-如果给定的应用程序根本没有引用-这样的应用程序将不会在输出中返回,而建议的查询将返回count = 0

例如-如果您将两个查询中的WHERE子句都注释掉-首先将返回76,073,734;而第二个将返回29,489,639个应用程序。

在此特定用例中,它可能并不那么重要-但对于您的下一个查询要牢记

  

另一个问题是查询的数字是14,与原始网站中的7不同。有任何错误吗?

7是正确答案-参见下文

#standardSQL
SELECT 
  p.application_number AS app, 
  COUNT(DISTINCT c.publication_number) Citations 
FROM `patents-public-data.patents.publications` AS p,
UNNEST(citation) c
WHERE p.application_number IN ('CN-201510747352-A') 
GROUP BY p.application_number     

有结果

Row app                 Citations    
1   CN-201510747352-A   7