将日志文件从Google Cloud Storage导入BigQuery

时间:2014-09-01 16:02:37

标签: google-app-engine google-bigquery google-cloud-storage logstash

我们正在开发将日志文件(例如下面)从logstach上传到Google云端存储的项目。然后让App Engine将日志数据导入BigQuery。问题是

  1. BigQuery在日志文件中不接受某些字段名称,例如logstach创建的@timestamp。我怎么能处理这个问题.Can App Engine可以做些什么来解决这个问题吗?

  2. 如何为嵌套的JSON(geoip :)定义BigQuery架构?

    { “的uuid”: “8806ceef34123122cdd009063f301a34158252f53b9a7d3147639fb71f68b585”, “ITEM_ID”:1234, “member_id”:1234, “admin_id”:0 “cate_id”:131, “listing_status”:3 “monitor_status”:2“,音符“:”“,”txn_type“:”edit“,”ip_address“:”13.89.42.18“,”email“:”xxxx@gmail.com“,”post_name“:”“,”user_agent“:”COM Mozilla / 5.0(Windows NT 6.1; rv:31.0)Gecko / 20100101 Firefox / 31.0“,”timestamp“:”2014-08-22 06:38:53“,”http_host“:”EA1-ZoneS1“,”@版本 “:” 1" ,的 “@时间戳”: “2014-08-21T23:38:59.737Z”, “类型”: “redis的”, “ua.name”: “Firefox”,“ua.os”:“Windows 7”,“ua.os_name”:“Windows 7”,“ua.device”:“其他”,“ua.major”:“31”,“ua.minor” “:” 0" ,的 “geoip的”:{ “IP”: “13.89.42.18”, “country_code2”: “XX”, “country_code3”: “XXX”, “COUNTRY_NAME”: “XXXXXXX”,“continent_code “:” AS “ ”REGION_NAME“: ”40“, ”CITY_NAME“: ”XXXX“, ”纬度“:123.45, ”东经“:123.45, ”时区“: ”亚洲/曼谷“, ”real_region_name“:” XXXXXX ”, “位置”:[123.45,123.45]} }

  3. 抱歉,我是新人。我无法添加图片。

    请给我建议

    感谢。

1 个答案:

答案 0 :(得分:5)

1)您不能在名称中使用@.符号。您需要通过这样的方式运行数据来删除它们。

line = line.replace("@", "_")    
line = line.replace("ua.", "ua_")

2)您可能需要更改某些类型,但我可以使用此JSON构造函数加载您的示例数据(使用上述修改):

[{
    "name": "uuid",
    "type": "STRING"
}, {
    "name": "item_id",
    "type": "INTEGER"
}, {
    "name": "member_id",
    "type": "INTEGER"
}, {
    "name": "admin_id",
    "type": "INTEGER"
}, {
    "name": "cate_id",
    "type": "INTEGER"
}, {
    "name": "listing_status",
    "type": "INTEGER"
}, {
    "name": "monitor_status",
    "type": "INTEGER"
}, {
    "name": "note",
    "type": "STRING"
}, {
    "name": "txn_type",
    "type": "STRING"
}, {
    "name": "ip_address",
    "type": "STRING"
}, {
    "name": "email",
    "type": "STRING"
}, {
    "name": "post_name",
    "type": "STRING"
}, {
    "name": "user_agent",
    "type": "STRING"
}, {
    "name": "timestamp",
    "type": "TIMESTAMP"
}, {
    "name": "http_host",
    "type": "STRING"
}, {
    "name": "_version",
    "type": "STRING"
}, {
    "name": "_timestamp",
    "type": "TIMESTAMP"
}, {
    "name": "type",
    "type": "STRING"
}, {
    "name": "ua_name",
    "type": "STRING"
}, {
    "name": "ua_os",
    "type": "STRING"
}, {
    "name": "ua_os_name",
    "type": "STRING"
}, {
    "name": "ua_device",
    "type": "STRING"
}, {
    "name": "ua_major",
    "type": "STRING"
}, {
    "name": "ua_minor",
    "type": "STRING"
}, {
    "name": "geoip",
    "type": "RECORD",
    "fields": [{
        "name": "ip",
        "type": "STRING"
    }, {
        "name": "country_code2",
        "type": "STRING"
    }, {
        "name": "country_code3",
        "type": "STRING"
    }, {
        "name": "country_name",
        "type": "STRING"
    }, {
        "name": "continent_code",
        "type": "STRING"
    }, {
        "name": "region_name",
        "type": "STRING"
    }, {
        "name": "city_name",
        "type": "STRING"
    }, {
        "name": "latitude",
        "type": "FLOAT"
    }, {
        "name": "longitude",
        "type": "FLOAT"
    }, {
        "name": "timezone",
        "type": "STRING"
    }, {
        "name": "real_region_name",
        "type": "STRING"
    }, {
        "name": "location",
        "type": "FLOAT",
        "mode": "REPEATED"
    }]
}]