Bigrquery将字符串强制转换为整数(模式是字符串)

时间:2018-11-23 02:52:19

标签: r google-bigquery bigrquery

我正在使用邮政编码,其中当然有前导零。我正确加载了数据框以保留R中的前导零,但是上传步骤似乎失败了。这就是我的意思:

这是我的minimal.csv文件:

zip,val
07030,10
10001,100
90210,1000
60602,10000

这是R代码

require("bigrquery")
filename <- "minimal.csv"
tablename <- "as_STRING"
ds <- bq_dataset(project='myproject', dataset="zips")

我还正确设置了架构中的类型,以使其期望为字符串。

# first pass
df <- read.csv(filename, stringsAsFactors=F)
# > df
#     zip   val
# 1  7030    10
# 2 10001   100
# 3 90210  1000
# 4 60602 10000

# uh oh!  Let's fix it!

cols <- unlist(lapply(df, class))
cols[[1]] <- "character" # make zipcode a character

# then reload
df2 <- read.csv(filename, stringsAsFactors=F, colClasses=cols)
# > df2
#     zip   val
# 1 07030    10
# 2 10001   100
# 3 90210  1000
# 4 60602 10000

# much better!  You can see my zips are now strings.

但是,当我尝试上载字符串时,bigrquery接口抱怨我上载了整数,而整数却没有。这是模式,需要字符串:

# create schema
bq_table_create(bq_table(ds, tablename), fields=df2) # using df2, which has strings

# now prove it got the strings right:
    > bq_table_meta(bq_table(ds, tablename))$schema$fields
    [[1]]
    [[1]]$name
    [1] "zip"

    [[1]]$type
    [1] "STRING"                # GOOD, ZIP IS A STRING!

    [[1]]$mode
    [1] "NULLABLE"


    [[2]]
    [[2]]$name
    [1] "val"

    [[2]]$type
    [1] "INTEGER"

    [[2]]$mode
    [1] "NULLABLE"

现在该上传了。...

bq_table_upload(bq_table(ds, tablename), df2) # using df2, with STRINGS
Error: Invalid schema update. Field zip has changed type from STRING to INTEGER [invalid]

嗯?什么是无效的架构更新,如何阻止它尝试将我的字符串(数据包含,架构为)更改为整数,我的数据不包含且该架构不为整数?

是否正在发生JavaScript序列化并将我的字符串变回整数?

2 个答案:

答案 0 :(得分:2)

这是因为未指定BigQuery会自动检测该架构。可以通过指定fields参数来解决此问题(例如,请参见this similar question

bq_table_upload(bq_table(ds, tablename), df2,fields = list(bq_field("zip", "string"),bq_field("val", "integer")))

更新:

在代码中,bq_table_upload正在调用bq_perform_upload,它将参数fields作为模式。最后,它将data frame解析为JSON文件,然后将其上传到BigQuery。

答案 1 :(得分:2)

只需更改:

bq_table_upload(tab, df)

bq_table_upload(tab, df, fields=df)

有效。