将列类型转换为R中的read_csv()列类型

时间:2017-11-30 05:28:12

标签: r data-manipulation readr

我最喜欢的关于库(readr)和R中的read_csv()函数的一点是,它几乎总是将我的数据的列类型设置为正确的类。但是,我目前正在使用R中的API,它将数据作为所有字符类的数据框返回给我,即使数据显然是数字。以此数据框为例,其中包含一些体育数据:

dput(mydf)
structure(list(isUnplayed = c("false", "false", "false"), isInProgress = 
c("false", "false", "false"), isCompleted = c("true", "true", "true"), awayScore = c("106", 
"95", "95"), homeScore = c("94", "97", "111"), game.ID = c("31176", 
"31177", "31178"), game.date = c("2015-10-27", "2015-10-27", 
"2015-10-27"), game.time = c("8:00PM", "8:00PM", "10:30PM"), 
    game.location = c("Philips Arena", "United Center", "Oracle Arena"
    ), game.awayTeam.ID = c("88", "86", "110"), game.awayTeam.City = c("Detroit", 
    "Cleveland", "New Orleans"), game.awayTeam.Name = c("Pistons", 
    "Cavaliers", "Pelicans"), game.awayTeam.Abbreviation = c("DET", 
    "CLE", "NOP"), game.homeTeam.ID = c("91", "89", "101"), game.homeTeam.City = c("Atlanta", 
    "Chicago", "Golden State"), game.homeTeam.Name = c("Hawks", 
    "Bulls", "Warriors"), game.homeTeam.Abbreviation = c("ATL", 
    "CHI", "GSW"), quarterSummary.quarter = list(structure(list(
        `@number` = c("1", "2", "3", "4"), awayScore = c("25", 
        "23", "34", "24"), homeScore = c("25", "18", "23", "28"
        )), .Names = c("@number", "awayScore", "homeScore"), class = "data.frame", row.names = c(NA, 
    4L)), structure(list(`@number` = c("1", "2", "3", "4"), awayScore = c("17", 
    "23", "28", "27"), homeScore = c("26", "20", "25", "26")), .Names = c("@number", 
    "awayScore", "homeScore"), class = "data.frame", row.names = c(NA, 
    4L)), structure(list(`@number` = c("1", "2", "3", "4"), awayScore = c("35", 
    "14", "26", "20"), homeScore = c("39", "20", "35", "17")), .Names = c("@number", 
    "awayScore", "homeScore"), class = "data.frame", row.names = c(NA, 
    4L)))), .Names = c("isUnplayed", "isInProgress", "isCompleted", 
"awayScore", "homeScore", "game.ID", "game.date", "game.time", 
"game.location", "game.awayTeam.ID", "game.awayTeam.City", "game.awayTeam.Name", 
"game.awayTeam.Abbreviation", "game.homeTeam.ID", "game.homeTeam.City", 
"game.homeTeam.Name", "game.homeTeam.Abbreviation", "quarterSummary.quarter"
), class = "data.frame", row.names = c(NA, 3L))

在给定类类型的情况下,API返回后,处理此数据帧非常麻烦。我提出了一种更新列类的方法,如下所示:

write_csv(mydf, 'mydf.csv')
mydf <- read_csv('mydf.csv')

通过写入CSV然后使用read_csv()重新读取CSV,数据帧列将更新。不幸的是,我在我的目录中留下了一个我不想要的CSV文件。有没有办法将R数据帧的列更新为&#39; read_csv()&#39;列类,而不必实际写入CSV?

感谢任何帮助!

2 个答案:

答案 0 :(得分:3)

如果您只想让readr猜测列类型,则无需编写和读取数据。您可以使用readr::type_convert

iris %>% 
  dplyr::mutate(Sepal.Width = as.character(Sepal.Width)) %>% 
  readr::type_convert() %>% 
  str()

进行比较:

iris %>% 
  dplyr::mutate(Sepal.Width = as.character(Sepal.Width)) %>% 
  str()

答案 1 :(得分:1)

尝试此代码,type.convert将字符向量转换为逻辑,整数,数字,复数或因子。

indx <- which(sapply(df, is.character))
df[, indx] <- lapply(df[, indx], type.convert)
indx <- which(sapply(df, is.factor))
df[, indx] <- lapply(df[, indx], as.character)