您可以将以下代码复制到R
脚本文件中并运行它:
preprocess_brand_version = function(dataset) {
dataset$brand_version = gsub("^([0-9]+)(\\.[0-9]+)?.*$", "\\1\\2", dataset$brand_version, perl = TRUE)
dataset = dataset %>% mutate(
brand_version = ifelse(!(is.na(brand) || is.na(brand_version)), paste(substr(brand, 1, 3), ", ", brand_version, sep = ""), NA)
)
dataset$brand_version = as.factor(dataset$brand_version)
return (dataset)
}
a = data.frame(brand = c("Samsung", "Motorola"), brand_version = c("1.4.3", "6.3"))
b = a
b[1,2] = NA
a
b
preprocess_brand_version(b)
我的问题是,当我运行它时,我得到:
> a
brand brand_version
1 Samsung 1.4.3
2 Motorola 6.3
> b
brand brand_version
1 Samsung <NA>
2 Motorola 6.3
> preprocess_brand_version(b)
brand brand_version
1 Samsung <NA>
2 Motorola <NA>
我原本希望得到:“ Mot,6.3”作为摩托罗拉行上版本的新值。
有人知道为什么if_else
无法正常工作吗?
谢谢!
答案 0 :(得分:1)
您正在使用双精度形式的“或” ||
,这将迫使代码遍历模式中的每个元素。切换为缩写形式|
应该可以解决此问题。
答案 1 :(得分:1)
仅将一个竖线用作或:
preprocess_brand_version = function(dataset) {
dataset$brand_version = gsub("^([0-9]+)(\\.[0-9]+)?.*$", "\\1\\2", dataset$brand_version, perl = TRUE)
dataset = dataset %>% mutate(
brand_version = ifelse(!(is.na(brand) | is.na(brand_version)), paste(substr(brand, 1, 3), ", ", brand_version, sep = ""), NA)
)
dataset$brand_version = as.factor(dataset$brand_version)
return (dataset)
}
如果需要,我在youtube上有一个关于正则表达式的简短教程: https://www.youtube.com/watch?v=YeMC1aNNu-4