R:根据列的内容,在执行计算后将字符列转换为数字

时间:2018-07-30 13:14:06

标签: r dataframe dplyr tidyverse

我有一个包含多个character列的数据框。我想将每一列转换为数值,该值基于列字符串确定。

条件1-用连字符-分隔时,获取两个值的mean

条件2-当有+时,将3添加到数字中。

请参见以下示例:

输入:

ColA      ColB     ColC
10 - 15   10       20 - 30
5 - 4     40-60    10+
11+       5 - 15   7 - 10

    df <- data.frame(matrix(data = c("10 - 15",  "10" ,      "20 - 30",
        "5 - 4" ,      "40-60",    "10+",
"11+",      "5 - 15 " ," 7 - 10"),
nrow = 3, ncol = 3, byrow = TRUE))

预期输出:

ColA      ColB     ColC
12.5      10       25
4.5       50       13
14        10       8.5

这是我尝试获取平均值的代码。尺寸变得混乱了。

getAverage = function(x){
  x[is.na(x)] = 0
  rowMeans(do.call(rbind.data.frame, strsplit(gsub("[^0-9|-]", "", x), 
split = "-") ) %>% mutate_all(as.character) %>% 
mutate_all(as.numeric))
  }

test = sapply(reqCols, function(x) getAverage(x))

2 个答案:

答案 0 :(得分:1)

您可以执行以下操作:

oshan_upd <- function(x) {
  # This functions takes a vector...
  # Mean elements
  meanr <- grepl("-", x)
  # Calculate new value (nv)
  mr_nv <- strsplit(x[meanr], "\\s*-\\s*")
  mr_nv <- sapply(mr_nv, function(x) mean(as.numeric(x)))
  # Replace corresponding values with the new value
  x[meanr] <- mr_nv

  # Plus elements... same process
  plus3r <- grepl("\\+", x)
  pr_nv <- as.numeric(gsub("\\s*\\+\\s*", "", x[plus3r])) + 3
  x[plus3r] <- pr_nv
  as.numeric(x) 
}

df[] <- lapply(df, oshan_upd)
df
  ColA ColB ColC
1 12.5   10 25.0
2  4.5   50 13.0
3 14.0   10  8.5

位置:

df <- data.frame(
  ColA = c("10 - 15", "5 - 4", "11+"), 
  ColB = c("10", "40-60", "5 - 15"), 
  ColC = c("20 - 30", "10+", "7 - 10")
)

答案 1 :(得分:1)

# example data
df <- data.frame(ColA = c("10 - 15", "5 - 4", "11+"), 
                 ColB = c("10", "40-60", "5 - 15"), 
                 ColC = c("20 - 30", "10+", "7 - 10"), stringsAsFactors = F)
library(dplyr)

# create function and vectorize it
f = function(x){
  ifelse(grepl("[-]", x), mean(as.numeric(unlist(strsplit(x, "[-]")))), 
         ifelse(grepl("[+]", x), as.numeric(unlist(strsplit(x, "[+]")))+3, as.numeric(x)))
}
f = Vectorize(f)

# apply function to all columns
df %>% mutate_all(f)

#   ColA ColB ColC
# 1 12.5   10 25.0
# 2  4.5   50 13.0
# 3 14.0   10  8.5

您也可以像这样使用case_when代替ifelse

f = function(x){
  case_when(grepl("[-]", x) ~ mean(as.numeric(unlist(strsplit(x, "[-]")))),
            grepl("[+]", x) ~ as.numeric(unlist(strsplit(x, "[+]")))+3,
            TRUE ~ as.numeric(x))
}
f = Vectorize(f)

这将为您提供相同的输出,但也有一些警告。