R - 将chr值从多列转换为num?

时间:2017-05-30 04:11:11

标签: r dataframe char numbers

我有这个数据框,我想将chr值更改为num:

> dput(Df)
structure(list(`@MeasurementDateGMT` = c("2016-09-01 00:00:00", 
"2016-09-01 01:00:00", "2016-09-01 02:00:00", "2016-09-01 03:00:00", 
"2016-09-01 04:00:00", "2016-09-01 05:00:00", "2016-09-01 06:00:00", 
"2016-09-01 07:00:00", "2016-09-01 08:00:00", "2016-09-01 09:00:00", 
"2016-09-01 10:00:00", "2016-09-01 11:00:00", "2016-09-01 12:00:00", 
"2016-09-01 13:00:00", "2016-09-01 14:00:00", "2016-09-01 15:00:00", 
"2016-09-01 16:00:00", "2016-09-01 17:00:00", "2016-09-01 18:00:00", 
"2016-09-01 19:00:00", "2016-09-01 20:00:00", "2016-09-01 21:00:00", 
"2016-09-01 22:00:00", "2016-09-01 23:00:00"), `@Value` = c("10.9", 
"9.8", "9.9", "14.1", "13.6", "16.5", "15", "18.5", "18", "17", 
"16.6", "12", "12.1", "18.1", "15.9", "15.9", "16.9", "21.6", 
"23.5", "40.7", "16.6", "12.7", "12.4", "12.2")), .Names = c("@MeasurementDateGMT", 
"@Value"), class = "data.frame", row.names = c(NA, 24L))

要转换的代码:

columns <- sapply(Df, is.factor)
Df[, columns] <- lapply(Df[, columns, drop = FALSE], function(x) as.numeric(as.character(x)))

结果:

> str(Df)
'data.frame':   24 obs. of  2 variables:
 $ @MeasurementDateGMT: chr  "2016-09-01 00:00:00" "2016-09-01 01:00:00" "2016-09-01 02:00:00" "2016-09-01 03:00:00" ...
 $ @Value             : chr  "10.9" "9.8" "9.9" "14.1" ...

他们仍然是chr。我错过了什么?任何想法?

2 个答案:

答案 0 :(得分:2)

我们可以使用type.convert

Df[] <- lapply(Df, function(x) type.convert(x, as.is = TRUE))
str(Df)
#'data.frame':   24 obs. of  2 variables:
#$ @MeasurementDateGMT: chr  "2016-09-01 00:00:00" "2016-09-01 01:00:00" "2016-09-01 02:00:00" "2016-09-01 03:00:00" ...
#$ @Value             : num  10.9 9.8 9.9 14.1 13.6 16.5 15 18.5 18 17 

...

如果我们需要转换&#39; datetime&#39;专栏,

Df[[2]] <- as.POSIXct(Df[[2]])

由于OP的帖子中的列都是character,因此我们不需要在应用characcter之前将其转换为type.convert,否则请使用{{1} }}

好的,如果我们需要type.convert(as.character(x), ..来执行此操作

dplyr

或另一个选项是library(dplyr) res <- Df %>% mutate_all(funs(type.convert(as.character(.), as.is = TRUE))) str(res) #'data.frame': 24 obs. of 2 variables: #$ @MeasurementDateGMT: chr "2016-09-01 00:00:00" "2016-09-01 01:00:00" "2016-09-01 02:00:00" "2016-09-01 03:00:00" ... #$ @Value : num 10.9 9.8 9.9 14.1 13.6 16.5 15 18.5 18 17 ...

data.table

答案 1 :(得分:1)

您可以使用dplyr::mutate_if将函数(在本例中为as.numeric)应用于满足谓词函数的所有列(在本例中为is.character)。

library(dplyr)

df %>% 
  janitor::clean_names() %>% # removes the "@" from names since that messes up mutate_if
  tibble::as_tibble() %>% # just for the nice printing
  mutate_if(is.character, as.numeric)

#> Warning in eval(substitute(expr), envir, enclos): NAs introduced by
#> coercion

#> # A tibble: 24 x 2
#>    x_measurementdategmt x_value
#>                   <dbl>   <dbl>
#>  1                   NA    10.9
#>  2                   NA     9.8
#>  3                   NA     9.9
#>  4                   NA    14.1
#>  5                   NA    13.6
#>  6                   NA    16.5
#>  7                   NA    15.0
#>  8                   NA    18.5
#>  9                   NA    18.0
#> 10                   NA    17.0
#> # ... with 14 more rows

但是上面的第一列并不适用,因为它是一个日期时间。它只是被NA设置为 as.numeric ,因为它包含非数字字符。相反,您可能应该将其更改为日期时间变量。

df %>% 
  janitor::clean_names() %>%
  tibble::as_tibble() %>% 
  mutate(x_measurementdategmt = lubridate::as_datetime(x_measurementdategmt)) %>% 
  mutate_if(is.character, as.numeric)
#> # A tibble: 24 x 2
#>    x_measurementdategmt x_value
#>                  <dttm>   <dbl>
#>  1  2016-09-01 04:00:00    10.9
#>  2  2016-09-01 05:00:00     9.8
#>  3  2016-09-01 06:00:00     9.9
#>  4  2016-09-01 07:00:00    14.1
#>  5  2016-09-01 08:00:00    13.6
#>  6  2016-09-01 09:00:00    16.5
#>  7  2016-09-01 10:00:00    15.0
#>  8  2016-09-01 11:00:00    18.5
#>  9  2016-09-01 12:00:00    18.0
#> 10  2016-09-01 13:00:00    17.0
#> # ... with 14 more rows