我正在使用grep和grepl搜索字符变量并创建简化的级别。
我尝试将结果保存在数据框中。我也尝试过使用if和else if语句并仅指定变量。我已经附上了这段代码,并且for语句无法运行。
for(i in 1:length(D$ID)){
if(grepl("Bachelor", D$NDEGREE)[i]){D$NDegree[i] <- "Bachelors"}
else if(grepl("BS", D$NDEGREE)[i]){D$NDegree[i] <- "Bachelors"}
else if (grepl("Master", D$NDEGREE)[i]){D$NDegree[i] <- "Masters"}
else if(grepl("Doctor", D$NDEGREE)[i]){D$NDegree[i] <- "Doctors"}
else(D$NDegree[i] <- D$NDEGREE[i])}
Bachelors <- D[grep("Bachelor", D$NDEGREE),]
BS <- D[grep("BS", D$NDEGREE),]
Masters <- D[grep("Master", D$NDEGREE),]
Doctors <- D[grep("Doctor", D$NDEGREE),]
编辑:我也尝试过
D$NDEGREE <- gsub("Bachelor", "Bachelors", D$NDEGREE)
D$NDEGREE <- gsub("BS", "Bachelors", D$NDEGREE)
D$NDEGREE <- gsub("Master", "Masters", D$NDEGREE)
D$NDEGREE <- gsub("Doctor", "Doctors", D$NDEGREE)
这只是运行,但没有任何反应。 for if语句不起作用。它只是无限期地运行。
答案 0 :(得分:1)
您不必在R中的列上进行循环。只需使用向量化操作即可。这是将函数应用于整个向量的操作。使用gsub
函数重新编码值。
df <- data.frame(
NDEGREE =c("Bachelor", "Master", "Doctor", "BS"),
Value = c(1,1,1,1)
)
df$NDEGREE <- gsub("Bachelor", "Bachelors", df$NDEGREE)
df$NDEGREE <- gsub("BS", "Bachelors", df$NDEGREE)
df$NDEGREE <- gsub("Master", "Masters", df$NDEGREE)
df$NDEGREE <- gsub("Doctor", "Doctors", df$NDEGREE)
Bachelors <- df[grep("Bachelors", df$NDEGREE),]
Doctors <- df[grep("Doctors", df$NDEGREE),]
Masters <- df[grep("Masters", df$NDEGREE),]
答案 1 :(得分:1)
一个更简单的选择(如果有很多值)将是创建键/值数据集,然后进行模糊联接
library(fuzzyjoin)
regex_left_join(D, keyval, by = "NDegree")
keyval <- data.frame(NDegree = c("Bachelor", "BS", "Master", "Doctor"),
val = c("Bachelors", "Bachelors", "Masters", "Doctors"),
stringsAsFactors = FALSE);