我有一个包含超过400,000个观测数据的数据框,我正在尝试为其添加一个列,其值取决于另一列,有时还有多个列。
这是我正在尝试做的一个更简单的例子:
# Creating a data frame
M <- data.frame(c("A","B","C"),c(5,100,60))
names(M) <- c("Letter","Number")
#adding a column
M$Size <- NA
# if Number <= 50 Size is small,
# if Number is between 50 and 70, Size is Medium
# if Number is Bigger than 70, Size is Big
ifelse (M$Number <=50, M$Size <-"Small",
ifelse(M$Number <= 70,
M$Size <- "Medium",
M$Size <- "Big"
))
当我运行代码时,我得到的输出是:
[1] "Small" "Big" "Medium"
但是M中的“Size”列始终是ifelse函数中的最后一个条件:
> print (M)
Letter Number Size
1 A 5 Big
2 B 100 Big
3 C 60 Big
我想要的结果:
> print (M)
Letter Number Size
1 A 5 Small
2 B 100 Big
3 C 60 Medium
我可以通过对每个条件subset
进行子集化并使用rbind
来获得我想要的结果来解决问题,但代码将会非常长,因为我正在处理的原始数据框很大,这需要更多的时间来运行。所以我想知道如何解决这个问题?
答案 0 :(得分:4)
使用cut
:
M$Size <- cut(M$Number, breaks = c(-Inf, 50, 70, Inf),
labels = c("small", "medium", "large"))
# etter Number Size
#1 A 5 small
#2 B 100 large
#3 C 60 medium
答案 1 :(得分:3)
这会帮助你 -
# Creating a data frame
M <- data.frame(c("A","B","C"),c(5,100,60))
names(M) <- c("Letter","Number")
#adding a column
# if Number <= 50 Size is small,
# if Number is between 50 and 70, Size is Medium
# if Number is Bigger than 70, Size is Big
# M$Size[M$Number <= 50] <- "Small"
# Edit: No need to subset "Small"
M$Size <- "Small"
M$Size[M$Number >50 & M$Number<70] <- "Medium"
M$Size[M$Number > 70] <- "Big"
# Letter Number Size
# 1 A 5 Small
# 2 B 100 Big
# 3 C 60 Medium
上查看
答案 2 :(得分:1)
同样的想法,但改为分配它。不需要包裹。
M$Size <- ifelse(M$Number <= 50, 'Small', ifelse(M$Number <= 70, 'Medium', 'Big'))
结果:
Letter Number Size
1 A 5 Small
2 B 100 Big
3 C 60 Medium