Question

可能重复：
Assigning values to a df$column based on another column in the same df

假设我有数据框：

table<- data.frame(population=c(100, 300, 5000, 2000, 900, 2500), habitat=c(1,2,3,4,5,6))

现在我想添加一个新的列表$ size，其值为1，如果是＆lt; 500，如果500＆lt; =人口＆lt; 1000,3，如果1000＆lt; =人口＆lt; 2000,4，如果2000＆lt; =人口＆lt; 3000,5，如果3000＆lt; =人口＆lt; = 5000

我只知道如何根据另一列中的值创建一个二进制TRUE / FALSE结果的列，例如

table$size <- (table$population<1000)

但是我不确定是为了获得不同条件的不同数字。任何人都可以提供帮助吗？

Answer 1

首先不要拨打data.frame table，因为table是基本功能。

您可以使用findInterval：

df <- data.frame(population=c(100, 300, 5000, 2000, 900, 2500), 
                 habitat=c(1,2,3,4,5,6))
v <- c(-Inf,500,1000,2000,3000,5000)
df$size <- findInterval(df$population,v,all.inside = TRUE)
  population habitat size
1        100       1    1
2        300       2    1
3       5000       3    5
4       2000       4    4
5        900       5    2
6       2500       6    4

我使用all.inside = TRUE，因为你想将5000定义为5，我认为值不能大于5。如果他们可以，你可以使用像

这样的东西

v <- c(-Inf,500,1000,2000,3000,5001,Inf)。

Answer 2

您可以为映射定义一个函数。因此包括你的不同垃圾箱：

mysize <- function(x){
  if(x<500)
   return(1)
  if(500 <= x & x < 1000)
    return(2)
  if(1000<=x & x<2000)
    return(3)
  if(2000<=x & x<3000)
    return(4)
  if(3000<=x & x <=5000)
    return(5)
  else
    return(NA)
}

然后，您可以将此功能应用于填充列，并添加所需的新列：

table$population.bin <- sapply(table$population, mysize)
table

Answer 3

只要您可以处理5的任何数字<5001而不是＆lt; = 5000，您可能只想要带标签的剪切功能。

# look at the help window
?cut

# initiate your table
table <- 
    data.frame(
        population = c( 100 , 300, 5000, 2000, 900, 2500) , 
        habitat = 1:6
    )

# create a new column with the desired cutpoints
table$size <- 
    cut( 
        # input data
        table$population , 
        # cut points
        c( -Inf , 500 , 1000 , 2000 , 3000 , 5001 ) , 
        # label values (character strings work too)
        labels = 1:5 ,
        # interval closed on the right?
        right = FALSE
    )

在R数据框中添加一个新列，其值以另一列的值为条件

3 个答案: