使用ifelse转换R中的列

时间:2018-11-21 14:17:15

标签: r

我有一个带有一列数字的数据框。

在另一列中,我想根据数字打印数字是“小于10”,“ 10到20之间”还是“ 20到30之间”。

到目前为止,我已经生成了此代码,但到目前为止还行不通,有人可以建议我如何对其进行修改以使其起作用吗?

#create some data
data<-data.frame(number=(1:40))

#ifelse statement
data$words<-
ifelse(data[,"number"]>=0&&<=9,"less than 10",
ifelse(data[,"number"]>=10&&<=20,"between 10 and 20",
ifelse(data[,"number"]>=20&&<=30,"between 20 and 30", "other")))  

4 个答案:

答案 0 :(得分:2)

您可以使用基数R中的cut,但请注意,它使words变量成为一个因数。您只需要设置适当的时间间隔(这就是为什么我使用30.5等提高可读性的原因)。顺便说一句,在您的示例中,您编码的20应该重新编码为“ 10到20之间”和“ 20到30之间”,这是行不通的。

data$words <- cut(data$number, c(0,9.5,20.5,30.5,40), c("less than 10", "between 10 and 20", "between 20 and 30", "other"))
data

答案 1 :(得分:2)

主要问题是您需要在每个不等式测试中引用变量。为了使此内容更具可读性,我将所有内容包装在一个with(data...调用中。您的代码的另一个问题是使用&&而不是&。前者仅用于单个值,而后者仅比较两个向量的每个元素。

data$words<-
  with(data,
       ifelse(number >= 0 & number <= 9, "less than 10",
       ifelse(number >= 10 & number <= 20, "between 10 and 20",
       ifelse(number >= 20 & number <= 30, "between 20 and 30", "other"))))

我还认为,在不引入新语法的情况下,它比tidyverse更具可读性。调试也更容易。

答案 2 :(得分:1)

library(tidyverse)
 data<-data.frame(number=(1:40))
 data %>% 
   mutate(word = case_when(
     number>=0 & number<10~"less than 10",
     number>=10 & number<20~"between 10 and 20",
     number>=20 & number<30~"between 20 and 30",
     T~"Other"
   ))
   number              word
1       1      less than 10
2       2      less than 10
3       3      less than 10
4       4      less than 10
5       5      less than 10
6       6      less than 10
7       7      less than 10
8       8      less than 10
9       9      less than 10
10     10 between 10 and 20
11     11 between 10 and 20
12     12 between 10 and 20
13     13 between 10 and 20
14     14 between 10 and 20
15     15 between 10 and 20
16     16 between 10 and 20
17     17 between 10 and 20
18     18 between 10 and 20
19     19 between 10 and 20
20     20 between 20 and 30
21     21 between 20 and 30
22     22 between 20 and 30
23     23 between 20 and 30
24     24 between 20 and 30
25     25 between 20 and 30
26     26 between 20 and 30
27     27 between 20 and 30
28     28 between 20 and 30
29     29 between 20 and 30
30     30             Other
31     31             Other
32     32             Other
33     33             Other
34     34             Other
35     35             Other
36     36             Other
37     37             Other
38     38             Other
39     39             Other
40     40             Other

答案 3 :(得分:1)

您是否需要一言以蔽之?

您的代码中存在一些语法错误,但是可能的解决方案是执行类似的操作

data$text <- "other"
data$text[data$number >=0 & data$number < 10] <- "less than 10"
data$text[data$number >=10 & data$number < 20] <- "between 10 and 20"
data$text[data$number >=20 & data$number < 30] <- "between 20 and 30"

我创建了一个新列,因为如果我要用文本替换“数字”列中的值,则整个列将被强制转换为字符类型,并且可能导致不等式运算符出现意外行为。

您的类别中也有一些重叠之处。考虑将上限严格更改为小于(例如20既是> = 20又是<= 20,因此属于“ 10到20之间”和“ 20到30之间”类别

如果需要单线,可以使用cut()函数:

cut(data$number, breaks=c(0,10,20,30,Inf), 
labels=c("less than 10", "between 10 and 20", "between 20 and 30", "other"))

这会将数字向量变成因子。