Question

我有一个数据框，我试图添加一个新列，这是根据我试图放在函数中的一些简单决定来计算的。

calculateNewValue <- function(a, b)
{
    if(a == b)
        result <- 4
    if(a >= b * 2)
        result <- 2;
    if(a > b)
        result <- 3;
    if(a < b)
        result <- 5;
    if(a * 2 <= b)
        result <- 6;
    return(result);    
}
data.set$newCol <- calculateNewValue(data.set$colA, data.set$colB);

以下是我的示例数据：

Name    colA    colB
S1       4       4
S2       4       3
S3       4       5
S4       4       8

根据我的功能，我期望在newCol中看到的结果是：

然而，我实际得到的结果是：

我在这里缺少什么？

Answer 1

我认为你的功能没有错。您需要以迭代方式将其应用于数据框。

使用Map或mapply功能，您可以执行以下操作：

# using Map function
df$newCol <- unlist(Map(calculateNewValue, df$colA, df$colB))
print(df)

  Name colA colB newCol
1   S1    4    4      4
2   S2    4    3      3
3   S3    4    5      5
4   S4    4    8      6

# another one using mapply
df$newCol <- mapply(calculateNewValue, df$colA, df$colB)

Answer 2

ifelse已向量化：

calculateNewValue <- function(a, b)
{
    ifelse(a == b, 4,
      ifelse(a >= b * 2, 2,
        ifelse(a > b, 3,
          ifelse(a * 2 < b, 6,
            ifelse(a < b, 5)))))
}    

# now this should work fine:
data.set$newCol <- calculateNewValue(data.set$colA, data.set$colB)

我改变了最后两个条件的顺序，使更严格的条件先来。

Answer 3

如果您尝试使用向量运行函数，则会遇到问题。你写ir的方式，它只将元素与另一个元素进行比较。

calculateNewValue(c(4,4), c(4,3))
[1] 4
Warning messages:
1: In if (a == b) result <- 4 :
  the condition has length > 1 and only the first element will be used
2: In if (a >= b * 2) result <- 2 :
  the condition has length > 1 and only the first element will be used
3: In if (a > b) result <- 3 :
  the condition has length > 1 and only the first element will be used
4: In if (a < b) result <- 5 :
  the condition has length > 1 and only the first element will be used
5: In if (a * 2 <= b) result <- 6 :
  the condition has length > 1 and only the first element will be used

您必须分别将函数应用于每一行以获得所需的输出。使用@Manish回答

df$newCol <- unlist(Map(calculateNewValue, df$colA, df$colB))

如何使用来自其他两列和一些决策的数据初始化数据框中的新列？

3 个答案: