我试图在Big Query(BQ)中查询自然语言处理(NLP)调用的输出,但我很难以正确的格式获取BQ的输出。
我理解BQ采用json文件(作为换行符分隔) - 但不确定(a)NLP的输出是json换行符分隔符和(b)我的架构是否正确。
这是我正在使用的json输出:
{
"entities": [
{
"name": "Rowling",
"type": "PERSON",
"metadata": {
"wikipedia_url": "http://en.wikipedia.org/wiki/J._K._Rowling"
},
"salience": 0.65751493,
"mentions": [
{
"text": {
"content": " J.",
"beginOffset": -1
}
},
{
"text": {
"content": "K. Rowl",
"beginOffset": -1
}
}
]
},
{
"name": "LONDON",
"type": "LOCATION",
"metadata": {
"wikipedia_url": "http://en.wikipedia.org/wiki/London"
},
"salience": 0.14284456,
"mentions": [
{
"text": {
"content": "\ufeffLON",
"beginOffset": -1
}
}
]
},
{
"name": "Harry Potter",
"type": "WORK_OF_ART",
"metadata": {
"wikipedia_url": "http://en.wikipedia.org/wiki/Harry_Potter"
},
"salience": 0.0726779,
"mentions": [
{
"text": {
"content": "th Harry Pot",
"beginOffset": -1
}
},
{
"text": {
"content": "‘Harry Pot",
"beginOffset": -1
}
}
]
},
{
"name": "Deathly Hallows",
"type": "WORK_OF_ART",
"metadata": {
"wikipedia_url": "http://en.wikipedia.org/wiki/Harry_Potter_and_the_Deathly_Hallows"
},
"salience": 0.022565609,
"mentions": [
{
"text": {
"content": "he Deathly Hall",
"beginOffset": -1
}
}
]
}
],
"language": "en"
}
有没有办法通过Google Cloud shell中的命令行直接将输出发送到大查询?
非常感谢任何信息!
由于
答案 0 :(得分:2)
很高兴你找到了我的哈利波特博客文章!我建议将NL API的JSON响应存储为BigQuery中的字符串,然后使用user-defined function进行查询。您应该能够运行以下内容(该表是可公开查看的),以计算每个实体在您发布的JSON中出现的频率:
SELECT
COUNT(*) as entity_count, entity
FROM
JS(
(SELECT entities FROM [sara-bigquery:samples.hp_udf]),
entities,
"[{ name: 'entity', type: 'string'}]",
"function(row, emit) {
try {
x = JSON.parse(row.entities);
entities = x['entities'];
entities.forEach(function(data) {
emit({ entity: data.name });
});
} catch (e) {}
}"
)
GROUP BY entity
ORDER BY entity_count DESC
答案 1 :(得分:1)
通过Google Cloud shell中的命令行将输出直接发送到大查询
查看此页面,搜索“bq load” https://cloud.google.com/bigquery/bq-command-line-tool
这里有一些关于json架构的例子。 Schema to load json data to google big query