我有一个包含多个character
列的数据框。我想将每一列转换为数值,该值基于列字符串确定。
条件1-用连字符-
分隔时,获取两个值的mean
,
条件2-当有+
时,将3
添加到数字中。
请参见以下示例:
输入:
ColA ColB ColC
10 - 15 10 20 - 30
5 - 4 40-60 10+
11+ 5 - 15 7 - 10
df <- data.frame(matrix(data = c("10 - 15", "10" , "20 - 30",
"5 - 4" , "40-60", "10+",
"11+", "5 - 15 " ," 7 - 10"),
nrow = 3, ncol = 3, byrow = TRUE))
预期输出:
ColA ColB ColC
12.5 10 25
4.5 50 13
14 10 8.5
这是我尝试获取平均值的代码。尺寸变得混乱了。
getAverage = function(x){
x[is.na(x)] = 0
rowMeans(do.call(rbind.data.frame, strsplit(gsub("[^0-9|-]", "", x),
split = "-") ) %>% mutate_all(as.character) %>%
mutate_all(as.numeric))
}
test = sapply(reqCols, function(x) getAverage(x))
答案 0 :(得分:1)
您可以执行以下操作:
oshan_upd <- function(x) {
# This functions takes a vector...
# Mean elements
meanr <- grepl("-", x)
# Calculate new value (nv)
mr_nv <- strsplit(x[meanr], "\\s*-\\s*")
mr_nv <- sapply(mr_nv, function(x) mean(as.numeric(x)))
# Replace corresponding values with the new value
x[meanr] <- mr_nv
# Plus elements... same process
plus3r <- grepl("\\+", x)
pr_nv <- as.numeric(gsub("\\s*\\+\\s*", "", x[plus3r])) + 3
x[plus3r] <- pr_nv
as.numeric(x)
}
df[] <- lapply(df, oshan_upd)
df
ColA ColB ColC
1 12.5 10 25.0
2 4.5 50 13.0
3 14.0 10 8.5
位置:
df <- data.frame(
ColA = c("10 - 15", "5 - 4", "11+"),
ColB = c("10", "40-60", "5 - 15"),
ColC = c("20 - 30", "10+", "7 - 10")
)
答案 1 :(得分:1)
# example data
df <- data.frame(ColA = c("10 - 15", "5 - 4", "11+"),
ColB = c("10", "40-60", "5 - 15"),
ColC = c("20 - 30", "10+", "7 - 10"), stringsAsFactors = F)
library(dplyr)
# create function and vectorize it
f = function(x){
ifelse(grepl("[-]", x), mean(as.numeric(unlist(strsplit(x, "[-]")))),
ifelse(grepl("[+]", x), as.numeric(unlist(strsplit(x, "[+]")))+3, as.numeric(x)))
}
f = Vectorize(f)
# apply function to all columns
df %>% mutate_all(f)
# ColA ColB ColC
# 1 12.5 10 25.0
# 2 4.5 50 13.0
# 3 14.0 10 8.5
您也可以像这样使用case_when
代替ifelse
:
f = function(x){
case_when(grepl("[-]", x) ~ mean(as.numeric(unlist(strsplit(x, "[-]")))),
grepl("[+]", x) ~ as.numeric(unlist(strsplit(x, "[+]")))+3,
TRUE ~ as.numeric(x))
}
f = Vectorize(f)
这将为您提供相同的输出,但也有一些警告。