在数据框中工作,我想基于另一列中的值来操作列值

时间:2015-12-16 18:32:50

标签: r

在数据框中工作,我想根据另一列中的值来操作列值。这是我可重现的代码:

# four items
items <- c("coke", "tea", "shampoo","aspirin")

# scores for each item
score <- as.numeric(c(65,30,45,20))

# making a data frame of the two vectors created
df <- as.data.frame(cbind(items,score))

# score for coke is 65 and for tea it is 30.  I want to
# double score for tea OR coke if the score is below 50

ifelse(df$score[df$items %in% c("coke", "tea")] < 50, df$score*2, df$score)

#the above return NULL values with warning

#the statement df$score[df$items %in% c("coke", "tea")] does pull coke and tea scores

df$score[df$items %in% c("coke", "tea")]

非常感谢您的帮助

4 个答案:

答案 0 :(得分:1)

现在应该可以解决这个问题:

items <- c("coke", "tea", "shampoo","aspirin")

# scores for each item
score <- as.numeric(c(65,30,45,20))

尝试使用data.frame代替as.data.frame。使用后者会导致值转换为因子

# making a data frame of the two vectors created
df <- data.frame(items, score)

df
    items score
1    coke    65
2     tea    30
3 shampoo    45
4 aspirin    20


# score for coke is 65 and for tea it is 30.  I want to
# double score for tea OR coke if the score is below 50

df$score[df$items %in% c("coke", "tea")] = ifelse(df$score[df$items %in% c("coke", "tea")] < 50, df$score*2, df$score)

df
    items score
1    coke    65
2     tea    60
3 shampoo    45
4 aspirin    20

如果最终您的项目有重复条目,则此方法不起作用。

# New data with an added entry for item = coke and score = 15:
items <- c("coke", "tea", "shampoo","aspirin","coke")
# scores for each item
score <- c(65,30,45,20,15)

# making a data frame of the two vectors created
df <- data.frame(items, score)


# using the method from above the last entry get converted to a value of 90
# instead of 30
df$score[df$items %in% c("coke", "tea")] = ifelse(df$score[df$items %in% c("coke", "tea")] < 50, df$score*2, df$score)

df
    items score
1    coke    65
2     tea    60
3 shampoo    45
4 aspirin    20
5    coke    90

因此,如果您有任何可能有重复条目的情况,则必须使用此方法

df <- data.frame(items, score)

df$score[df$items %in% c("coke", "tea") & df$score < 50] <- 2* df$score[df$items %in% c("coke", "tea") & df$score < 50]

df
    items score
1    coke    65
2     tea    60
3 shampoo    45
4 aspirin    20
5    coke    30

答案 1 :(得分:0)

您的问题不需要if语句。您可以组合两个逻辑语句。

逻辑1:df$items %in% c("coke", "tea")

逻辑2:df$score < 50

通过过滤这两个逻辑语句的数据帧,您可以将得分相乘。和= &,或= |

df$score[df$items %in% c("coke", "tea") | df$score < 50] <- 2* df$score[df$items %in% c("coke", "tea") | df$score < 50]

答案 2 :(得分:0)

items <- c("coke", "tea", "shampoo","aspirin")
score <- as.numeric(c(65,30,45,20))   

如果您通过以下方式调用data.frame(),则可以避免将得分列转换为因子。

df <- data.frame(items=items,score=score)

您不需要if语句。您可以根据两个逻辑语句简单地提取您感兴趣的值:

df[df$score<50 & df$items %in% c("coke", "tea"), "score"] <- 2 * df[df$score<50 & df$items %in% c("coke", "tea"), "score"]
  • df$score<50 & df$items %in% c("coke", "tea")选择符合这两个条件的行,即可以选择焦炭或茶,并且得分低于50。

  • "score"仅选择分数列

  • <-右侧的声明提取相同的值并将它们乘以2。

答案 3 :(得分:0)

if语句的语法不太正确,看起来您试图以类似于在MS Excel中使用它的方式调用它。不幸的是,它没有做到这一点。

我建议您参加R课程的介绍(许多是免费在线提供),例如:

https://campus.datacamp.com/courses/free-introduction-to-r/chapter-1-intro-to-basics-1?ex=1

至于你的问题,这里有一个解决方案(如果我正确理解你的问题)。

item <- c("coke", "tea", "shampoo", "aspirin")
score <- as.numeric(c(65, 30, 45, 20))

df <- data.frame(item, score)

for (i in 1:length(df$item)){
  if ((df$item[i] == "coke" | df$item[i] == "tea") & df$score[i] < 50) {
    df$score[i] <- df$score[i] * 2
  }
}

View(df)

您需要注意的是,如果您现在查看更新后的数据框(&#34; df&#34;),则只有项目&#34;茶&#34;的分数已经加倍,因为它符合两个标准(即item = coke OR tea; AND它的相关分数低于50)。

希望这会有所帮助,祝你好运。