如何通过阅读器的read_delim_chunked使用col_types?

时间:2019-09-05 03:25:52

标签: r tidyverse readr

我正在尝试分块读取文件并指定col_types,请参阅MWE

write.csv(cars, "cars.csv")


library(readr)
readr::read_delim_chunked("cars.csv", function(x, i) {
  x
}, delim= ",", col_types = cols(
  speed = col_character()
), chunk_size = 10)

但是我得到了错误的输出

NULL

但非分块版本可以正常工作

library(readr)
readr::read_delim("cars.csv", delim= ",", col_types = cols(
  speed = col_character()
))

2 个答案:

答案 0 :(得分:1)

问题在于,当我们执行write.csv时,会将row.names作为新列包括在内

write.csv(cars, "cars.csv", row.names = FALSE, quote = FALSE)

此外,我们需要col_character()而不是col_character

readr::read_delim_chunked("cars.csv",  DataFrameCallback$new(function(x, i) {
  x
}), col_types = cols(
  speed = col_character()
), delim= ",",  chunk_size = 10)

答案 1 :(得分:0)

由于某些原因,出于我不了解的原因,您需要将函数包装在DataFrameCallback$new中。

write.csv(cars, "cars.csv")

作品

readr::read_delim_chunked("cars.csv",  DataFrameCallback$new(function(x, i) {
  x
}), col_types = cols(
  speed = col_character()
), delim= ",",  chunk_size = 10)

给出错误

readr::read_delim_chunked("cars.csv",  function(x, i) {
  x
}, col_types = cols(
  speed = col_character()
), delim= ",",  chunk_size = 10)