我使用标准的“接收器”将像素图像的访问日志存储在云存储桶dev-access-log-bucket
中
所以文件看起来像这样requests/2019/05/08/15:00:00_15:59:59_S1.json
一行看起来像这样(我格式化了json,但通常在一行上):
{
"httpRequest": {
"cacheLookup": true,
"remoteIp": "93.24.25.190",
"requestMethod": "GET",
"requestSize": "224",
"requestUrl": "https://dev-snowplow.legalstart.fr/one_pixel_image.png?user_id=0&action=purchase&product_id=0&money=10",
"responseSize": "779",
"status": 200,
"userAgent": "python-requests/2.21.0"
},
"insertId": "w6wyz1g2jckjn6",
"jsonPayload": {
"@type": "type.googleapis.com/google.cloud.loadbalancing.type.LoadBalancerLogEntry",
"statusDetails": "response_sent_by_backend"
},
"logName": "projects/tracking-pixel-239909/logs/requests",
"receiveTimestamp": "2019-05-08T15:34:24.126095758Z",
"resource": {
"labels": {
"backend_service_name": "",
"forwarding_rule_name": "dev-yolaw-pixel-forwarding-rule",
"project_id": "tracking-pixel-239909",
"target_proxy_name": "dev-yolaw-pixel-proxy",
"url_map_name": "dev-urlmap",
"zone": "global"
},
"type": "http_load_balancer"
},
"severity": "INFO",
"spanId": "7d8823509c2dc94f",
"timestamp": "2019-05-08T15:34:23.140747307Z",
"trace": "projects/tracking-pixel-239909/traces/bb55577eedd5797db2867931f8de9162"
}
所有这些都是标准的GCP内容,我在这里没有自定义任何内容。
因此,现在我想从Bigquery对其进行一些请求,我创建了一个数据集和一个像这样配置的外部表:
External Data Configuration
Source URI(s) gs://dev-access-log-bucket/requests/*
Auto-detect schema true (note: I don't know why it puts true though i've manually defined it)
Ignore unknown values true
Source format NEWLINE_DELIMITED_JSON
Max bad records 0
和以下手动模式:
timestamp DATETIME REQUIRED
httpRequest RECORD REQUIRED
httpRequest. requestUrl STRING REQUIRED
以及我运行请求
SELECT
timestamp
FROM
`path.to.my.table`
LIMIT
1000
我知道了
无效的字段名称“ @type”。字段必须仅包含字母,数字和下划线,以字母或下划线开头,并且最多为128个字符。
如何解决此问题而无需预处理日志以使其不包含“ @type”字段?