我是R的新手,我编写了一个代码,我相信可以使用for循环来缩短它。问题是我不知道如何编写循环。
我有一个带有“ TestGrade”列的数据框,其值类似于“ Grade 1”或“ Kindergarten”。我试图将该列更改为仅一个数字值。例如,将“幼儿园”更改为0,将“等级1”更改为1。我将在示例数据帧的下方提供代码,以及如何无循环解决问题。
任何指导将不胜感激!
##Sample Data
FirstInitial <- c("A", "D", "M", "C", "J", "S", "K", "L", "M", "K", "G", "B", "F")
LastInitial <- c("S", "M", "T", "M", "A", "B", "H", "M", "S", "W", "L", "Z", "P")
TestGrade <- c('Kindergarten', 'Grade 1','Grade 2', 'Grade 3','Grade 4', 'Grade 5', 'Grade 6','Grade 7','Grade 8', 'Grade 9', 'Grade 10', 'Grade 11','Grade 12')
df <- data.frame(FirstInitial, LastInitial, TestGrade)
##The codes current function
if(any(df$TestGrade == 'Kindergarten')){
df$TestGrade <- gsub('Kindergarten', '0', df$TestGrade)
}
if(any(df$TestGrade == 'Grade 1')){
df$TestGrade <- gsub('Grade 1', '1', df$TestGrade)
}
if(any(df$TestGrade == 'Grade 2')){
df$TestGrade <- gsub('Grade 2', '2', df$TestGrade)
}
if(any(df$TestGrade == 'Grade 3')){
df$TestGrade <- gsub('Grade 3', '3', df$TestGrade)
}
if(any(df$TestGrade == 'Grade 4')){
df$TestGrade <- gsub('Grade 4', '4', df$TestGrade)
}
if(any(df$TestGrade == 'Grade 5')){
df$TestGrade <- gsub('Grade 5', '5', df$TestGrade)
}
if(any(df$TestGrade == 'Grade 6')){
df$TestGrade <- gsub('Grade 6', '6', df$TestGrade)
}
if(any(df$TestGrade == 'Grade 7')){
df$TestGrade <- gsub('Grade 7', '7', df$TestGrade)
}
if(any(df$TestGrade == 'Grade 8')){
df$TestGrade <- gsub('Grade 8', '8', df$TestGrade)
}
if(any(df$TestGrade == 'Grade 9')){
df$TestGrade <- gsub('Grade 9', '9', df$TestGrade)
}
if(any(df$TestGrade == 'Grade 10')){
df$TestGrade <- gsub('Grade 10', '10', df$TestGrade)
}
if(any(df$TestGrade == 'Grade 11')){
df$TestGrade <- gsub('Grade 11', '11', df$TestGrade)
}
if(any(df$TestGrade == 'Grade 12')){
df$TestGrade <- gsub('Grade 12', '12', df$TestGrade)
}
答案 0 :(得分:7)
我们可以使用ifelse
,为“幼儿园”分配0,并从其他人中删除“成绩”
as.numeric(ifelse(df$TestGrade == "Kindergarten", 0,
sub("Grade ", "", df$TestGrade)))
#[1] 0 1 2 3 4 5 6 7 8 9 10 11 12
答案 1 :(得分:5)
我们可以使用case_when
library(dplyr)
library(readr)
df %>%
mutate(TestGrade = case_when(as.character(TestGrade) == "Kindergarten"~ 0,
TRUE ~ parse_number(TestGrade)))
# FirstInitial LastInitial TestGrade
#1 A S 0
#2 D M 1
#3 M T 2
#4 C M 3
#5 J A 4
#6 S B 5
#7 K H 6
#8 L M 7
#9 M S 8
#10 K W 9
#11 G L 10
#12 B Z 11
#13 F P 12
答案 2 :(得分:4)
首先简化:您不需要任何if(any(...))
。 gsub
很聪明,就像是查找/替换。命令gsub('Grade 9', '9', df$TestGrade)
将'Grade 9'
替换为'9'
,并且不会触及其他任何内容。因此,删除所有if
语句,我们得到:
df$TestGrade <- gsub('Kindergarten', '0', df$TestGrade)
df$TestGrade <- gsub('Grade 1', '1', df$TestGrade)
df$TestGrade <- gsub('Grade 2', '2', df$TestGrade)
df$TestGrade <- gsub('Grade 3', '3', df$TestGrade)
df$TestGrade <- gsub('Grade 4', '4', df$TestGrade)
df$TestGrade <- gsub('Grade 5', '5', df$TestGrade)
df$TestGrade <- gsub('Grade 6', '6', df$TestGrade)
df$TestGrade <- gsub('Grade 7', '7', df$TestGrade)
df$TestGrade <- gsub('Grade 8', '8', df$TestGrade)
df$TestGrade <- gsub('Grade 9', '9', df$TestGrade)
df$TestGrade <- gsub('Grade 10', '10', df$TestGrade)
df$TestGrade <- gsub('Grade 11', '11', df$TestGrade)
df$TestGrade <- gsub('Grade 12', '12', df$TestGrade)
下一个改进,我们可以做一个循环。这与上面的代码完全等效,只需要较少的键入即可。
pattern = c("Kindergarten", paste("Grade", 1:12))
replacement = as.character(0:12)
for (i in seq_along(pattern)) {
df$TestGrade <- gsub(pattern[i], replacement[i], df$TestGrade)
}
更好的是,我们可以变得更聪明,使幼儿园成为特殊情况,并从其他所有内容中删除"Grade "
,如Juian和Ronak的答案。另一个变化是:
df$TestGrade = as.character(df$TestGrade) # needed only if it is a factor
df$TestGrade[df$TestGrade == "Kindergarten"] = 0
df$TestGrade = sub("Grade ", "", df$TestGrade)
df$TestGrade = as.numeric(df$TestGrade) # if needed
如果我们真的想花哨的话,可以在fixed = TRUE
内设置sub()
。这表明sub
仅需要完全匹配,而并非试图使用正则表达式。这将使代码运行得更快,但是除非您有很多数据,否则您将不会发现任何区别。如果您有100,000+行,则此方法将非常快:
# optimized
df$TestGrade = as.character(df$TestGrade) # needed only if it is a factor
df$TestGrade[df$TestGrade == "Kindergarten"] = 0
df$TestGrade = as.integer(sub("Grade ", "", df$TestGrade, fixed = TRUE))
答案 3 :(得分:3)
无需使用两行代码进行for
循环即可完成此操作。我还建议您在运行这些行之前在stringsAsFactors = F
命令中添加data.frame
df$TestGrade[df$TestGrade == "Kindergarten"] = 0
df$TestGrade <- gsub("Grade ", "", df$TestGrade)
> df
FirstInitial LastInitial TestGrade
1 A S 0
2 D M 1
3 M T 2
4 C M 3
5 J A 4
6 S B 5
7 K H 6
8 L M 7
9 M S 8
10 K W 9
11 G L 10
12 B Z 11
13 F P 12
答案 4 :(得分:3)
您可以编写一个关键点并将等级设置为一个因子。即使成绩格式发生变化,这也将起作用。
key <- c('Kindergarten',
'Grade 1',
'Grade 2',
'Grade 3',
'Grade 4',
'Grade 5',
'Grade 6',
'Grade 7',
'Grade 8',
'Grade 9',
'Grade 10',
'Grade 11',
'Grade 12')
dat <- c('Grade 3', 'Grade 5', 'Grade 2')
dat <- factor(dat, levels = key)
dat <- as.numeric(dat) - 1
dat
我们在末尾减去1,因为因子从1开始并且您希望幼儿园设置为0。
答案 5 :(得分:2)
这在这里解决了您的问题:
df$TestGrade <- sapply(df$TestGrade,function(el)
{
if(el == "Kindergarten") return(0)
else return(as.numeric(sub("Grade ","",el)))
}