Question

我试图在Big Query（BQ）中查询自然语言处理（NLP）调用的输出，但我很难以正确的格式获取BQ的输出。

我理解BQ采用json文件（作为换行符分隔） - 但不确定（a）NLP的输出是json换行符分隔符和（b）我的架构是否正确。

这是我正在使用的json输出：

{
  "entities": [
    {
      "name": "Rowling",
      "type": "PERSON",
      "metadata": {
        "wikipedia_url": "http://en.wikipedia.org/wiki/J._K._Rowling"
      },
      "salience": 0.65751493,
      "mentions": [
        {
          "text": {
            "content": "   J.",
            "beginOffset": -1
          }
        },
        {
          "text": {
            "content": "K. Rowl",
            "beginOffset": -1
          }
        }
      ]
    },
    {
      "name": "LONDON",
      "type": "LOCATION",
      "metadata": {
        "wikipedia_url": "http://en.wikipedia.org/wiki/London"
      },
      "salience": 0.14284456,
      "mentions": [
        {
          "text": {
            "content": "\ufeffLON",
            "beginOffset": -1
          }
        }
      ]
    },
    {
      "name": "Harry Potter",
      "type": "WORK_OF_ART",
      "metadata": {
        "wikipedia_url": "http://en.wikipedia.org/wiki/Harry_Potter"
      },
      "salience": 0.0726779,
      "mentions": [
        {
          "text": {
            "content": "th Harry Pot",
            "beginOffset": -1
          }
        },
        {
          "text": {
            "content": "‘Harry Pot",
            "beginOffset": -1
          }
        }
      ]
    },
    {
      "name": "Deathly Hallows",
      "type": "WORK_OF_ART",
      "metadata": {
        "wikipedia_url": "http://en.wikipedia.org/wiki/Harry_Potter_and_the_Deathly_Hallows"
      },
      "salience": 0.022565609,
      "mentions": [
        {
          "text": {
            "content": "he Deathly Hall",
            "beginOffset": -1
          }
        }
      ]
    }
  ],
  "language": "en"
}

有没有办法通过Google Cloud shell中的命令行直接将输出发送到大查询？

非常感谢任何信息！

由于

Answer 1

很高兴你找到了我的哈利波特博客文章！我建议将NL API的JSON响应存储为BigQuery中的字符串，然后使用user-defined function进行查询。您应该能够运行以下内容（该表是可公开查看的），以计算每个实体在您发布的JSON中出现的频率：

SELECT 
  COUNT(*) as entity_count, entity
FROM 
  JS(
    (SELECT entities FROM [sara-bigquery:samples.hp_udf]),
    entities,
    "[{ name: 'entity', type: 'string'}]",
    "function(row, emit) { 
      try {
        x = JSON.parse(row.entities);
        entities = x['entities'];
        entities.forEach(function(data) {
          emit({ entity: data.name });
        });
      } catch (e) {}
    }" 
  )
GROUP BY entity
ORDER BY entity_count DESC

Answer 2

通过Google Cloud shell中的命令行将输出直接发送到大查询

查看此页面，搜索“bq load” https://cloud.google.com/bigquery/bq-command-line-tool

这里有一些关于json架构的例子。 Schema to load json data to google big query

如何为Big Query准备Google Natural Language Proscessing输出（json）

2 个答案: