我在BigQuery表之一中有一个看起来像这样的列。
{"name": "name1", "last_delivered": {"push_id": "push_id1", "time": "time1"}, "session_id": "session_id1", "source": "SDK", "properties": {"UserId": "u1"}}
有没有要在GBQ中获得像这样的输出? (基本上将整个列展平为不同的列)
name last_delivered.push_id last_delivered.time session_id source properties.UserId
name1 push_id1 time1 session_id1 SDK uid1
说
a = {“ name”:“ name1”,“ last_delivered”:{“ push_id”:“ push_id1”, “ time”:“ time1”},“ session_id”:“ session_id1”,“ source”:“ SDK”, “ properties”:{“ UserId”:“ u1”}}
我尝试使用 json_normalize(a)在Pandas Python中获得所需的输出,但每次尝试出现以下错误
任何人都不知道如何获得所需的输出。我错过了什么吗?
任何帮助将不胜感激!
答案 0 :(得分:2)
以下示例适用于BigQuery标准SQL
#standardSQL
WITH `project.dataset.table` AS (
SELECT '{"name": "name1", "last_delivered": {"push_id": "push_id1", "time": "time1"}, "session_id": "session_id1", "source": "SDK", "properties": {"UserId": "u1"}}' col
)
SELECT
JSON_EXTRACT_SCALAR(col, '$.name') name,
STRUCT(
JSON_EXTRACT_SCALAR(col, '$.last_delivered.push_id') AS push_id,
JSON_EXTRACT_SCALAR(col, '$.last_delivered.time') AS time
) last_delivered,
JSON_EXTRACT_SCALAR(col, '$.session_id') session_id,
JSON_EXTRACT_SCALAR(col, '$.source') source,
STRUCT(
JSON_EXTRACT_SCALAR(col, '$.properties.UserId') AS UserId
) properties
FROM `project.dataset.table`
并按预期/要求产生结果
Row name last_delivered.push_id last_delivered.time session_id source properties.UserId
1 name1 push_id1 time1 session_id1 SDK u1
答案 1 :(得分:2)
我对为什么它不起作用的猜测是您的json数据实际上是一个字符串:
from pandas.io.json import json_normalize
a = '''{"name": "name1", "last_delivered": {"push_id": "push_id1", "time": "time1"}, "session_id": "session_id1", "source": "SDK", "properties": {"UserId": "u1"}}'''
df = json_normalize(a)
输出:
AttributeError: 'str' object has no attribute 'values'
对:
from pandas.io.json import json_normalize
a = {"name": "name1", "last_delivered": {"push_id": "push_id1", "time": "time1"}, "session_id": "session_id1", "source": "SDK", "properties": {"UserId": "u1"}}
df = json_normalize(a)
输出:
print(df.to_string())
last_delivered.push_id last_delivered.time name properties.UserId session_id source
0 push_id1 time1 name1 u1 session_id1 SDK
在这种情况下,可以在规范化之前使用json.loads()
:
import json
from pandas.io.json import json_normalize
a = '''{"name": "name1", "last_delivered": {"push_id": "push_id1", "time": "time1"}, "session_id": "session_id1", "source": "SDK", "properties": {"UserId": "u1"}}'''
data = json.loads(a)
df = json_normalize(data)