我有一个带有一列数字的数据框。
在另一列中,我想根据数字打印数字是“小于10”,“ 10到20之间”还是“ 20到30之间”。
到目前为止,我已经生成了此代码,但到目前为止还行不通,有人可以建议我如何对其进行修改以使其起作用吗?
#create some data
data<-data.frame(number=(1:40))
#ifelse statement
data$words<-
ifelse(data[,"number"]>=0&&<=9,"less than 10",
ifelse(data[,"number"]>=10&&<=20,"between 10 and 20",
ifelse(data[,"number"]>=20&&<=30,"between 20 and 30", "other")))
答案 0 :(得分:2)
您可以使用基数R中的cut
,但请注意,它使words
变量成为一个因数。您只需要设置适当的时间间隔(这就是为什么我使用30.5等提高可读性的原因)。顺便说一句,在您的示例中,您编码的20
应该重新编码为“ 10到20之间”和“ 20到30之间”,这是行不通的。
data$words <- cut(data$number, c(0,9.5,20.5,30.5,40), c("less than 10", "between 10 and 20", "between 20 and 30", "other"))
data
答案 1 :(得分:2)
主要问题是您需要在每个不等式测试中引用变量。为了使此内容更具可读性,我将所有内容包装在一个with(data...
调用中。您的代码的另一个问题是使用&&
而不是&
。前者仅用于单个值,而后者仅比较两个向量的每个元素。
data$words<-
with(data,
ifelse(number >= 0 & number <= 9, "less than 10",
ifelse(number >= 10 & number <= 20, "between 10 and 20",
ifelse(number >= 20 & number <= 30, "between 20 and 30", "other"))))
我还认为,在不引入新语法的情况下,它比tidyverse
更具可读性。调试也更容易。
答案 2 :(得分:1)
library(tidyverse)
data<-data.frame(number=(1:40))
data %>%
mutate(word = case_when(
number>=0 & number<10~"less than 10",
number>=10 & number<20~"between 10 and 20",
number>=20 & number<30~"between 20 and 30",
T~"Other"
))
number word
1 1 less than 10
2 2 less than 10
3 3 less than 10
4 4 less than 10
5 5 less than 10
6 6 less than 10
7 7 less than 10
8 8 less than 10
9 9 less than 10
10 10 between 10 and 20
11 11 between 10 and 20
12 12 between 10 and 20
13 13 between 10 and 20
14 14 between 10 and 20
15 15 between 10 and 20
16 16 between 10 and 20
17 17 between 10 and 20
18 18 between 10 and 20
19 19 between 10 and 20
20 20 between 20 and 30
21 21 between 20 and 30
22 22 between 20 and 30
23 23 between 20 and 30
24 24 between 20 and 30
25 25 between 20 and 30
26 26 between 20 and 30
27 27 between 20 and 30
28 28 between 20 and 30
29 29 between 20 and 30
30 30 Other
31 31 Other
32 32 Other
33 33 Other
34 34 Other
35 35 Other
36 36 Other
37 37 Other
38 38 Other
39 39 Other
40 40 Other
答案 3 :(得分:1)
您是否需要一言以蔽之?
您的代码中存在一些语法错误,但是可能的解决方案是执行类似的操作
data$text <- "other"
data$text[data$number >=0 & data$number < 10] <- "less than 10"
data$text[data$number >=10 & data$number < 20] <- "between 10 and 20"
data$text[data$number >=20 & data$number < 30] <- "between 20 and 30"
我创建了一个新列,因为如果我要用文本替换“数字”列中的值,则整个列将被强制转换为字符类型,并且可能导致不等式运算符出现意外行为。
您的类别中也有一些重叠之处。考虑将上限严格更改为小于(例如20既是> = 20又是<= 20,因此属于“ 10到20之间”和“ 20到30之间”类别
如果需要单线,可以使用cut()函数:
cut(data$number, breaks=c(0,10,20,30,Inf),
labels=c("less than 10", "between 10 and 20", "between 20 and 30", "other"))
这会将数字向量变成因子。