优化:在多个条件下的数据框中替换值

时间:2017-12-27 19:07:34

标签: r optimization

我有一个类似于此示例的数据框:

df <- structure(list(Ball = structure(c(5L, 3L, 2L, 4L, 1L, 3L), .Label = c("blue", "blue is my favourite", "red", "red ", "red ball"), class = "factor"), size = c(1.2, 2, 3, 10, 12, 100)), .Names = c("Ball", "size"), class = "data.frame", row.names = c(NA, -6L))

根据两列中的信息,我想按大小和颜色对项目进行分类。输出应如下所示:

structure(list(Ball = structure(c(5L, 3L, 2L, 4L, 1L, 3L), .Label = c("blue", "blue is my favourite", "red", "red ", "red ball"), class = "factor"), size = c(1.2, 2, 3, 10, 12, 100), Class = c("small red ball", "small red ball", "small blue ball", "medium red ball", "medium blue ball", "big red ball")), row.names = c(NA, -6L), .Names = c("Ball", "size", "Class"), class = "data.frame")

我有运行代码,但它很长而且混乱,我相信有更简洁的方法来获得我想要的输出。

那我做了什么?

我开始选择第一堂课的项目并重命名所选的df$Class值:

df["Class"] <- NA #add new column

df[grepl("red", df$Ball) & df$size <10, ]$Class <- "small red ball"

因为我的grepl选择有时是空的,所以我添加了一个if (length() > 0条件:

if (length(df[grepl("red", df$Ball) & df$size <10, ]$Class) > 0) {df[grepl("red", df$Ball) & df$size <10, ]$Class <- "small red ball"}

最后我将所有选择合并到一个循环中

df["Class"] <- NA #add new column
z <- c("red", "blue")

for (i in z){
  if (length(df[grepl(i, df$Ball) & df$size <10, ]$Class) > 0) {df[grepl(i, df$Ball) & df$size <10, ]$Class <- paste("small", i, "ball", sep=" ")}
  if (length(df[grepl(i, df$Ball) & df$size >=10 & df$size <100, ]$Class) > 0) {df[grepl(i, df$Ball) & df$size >=10 & df$size <100, ]$Class <- paste("medium", i, "ball", sep=" ")}
  if (length(df[grepl(i, df$Ball) & df$size >=100, ]$Class) > 0) {df[grepl(i, df$Ball) & df$size >=100, ]$Class <- paste("big", i, "ball", sep=" ")}
}

它适用于两种颜色和三种尺寸类别,但我的原始数据框要大得多。因为它(看起来很混乱),我的问题是: 如何简化代码?

3 个答案:

答案 0 :(得分:2)

我们可以使用cut根据'尺寸'创建分组,并使用paste

使用提取的'Ball'值创建str_extract
library(stringr)
df$Class <- with(df, paste(as.character(cut(size, breaks = c(1, 9, 99, Inf), 
   labels = c('small', 'medium', 'big'))),  str_extract(Ball, 'red|blue'), 'ball'))
df$Class
#[1] "small red ball"   "small red ball"   "small blue ball"
#[4] "medium red ball"  "medium blue ball" "big red ball"    

答案 1 :(得分:2)

这个答案与@ akrun非常相似,但你可以包含更多颜色(这里我使用的是colors()调色板,但你也可以使用其他颜色。我也略微改变了{的参数{1}}功能。

cut

另外,为了使其更加通用,您可以使用以下内容来允许使用大写字母:

size<- cut(df$size, c(0, 10, 100, Inf), labels = c("small", "medium", "big"), right=F)
colors<- str_extract(df$Ball, paste(colors(), collapse="|"))
df$Class<- paste(size, colors, "ball", sep = " ")

> df
                  Ball  size            Class
1             red ball   1.2   small red ball
2                  red   2.0   small red ball
3 blue is my favourite   3.0  small blue ball
4                 red   10.0  medium red ball
5                 blue  12.0 medium blue ball
6                  red 100.0     big red ball

因此,如果colors<- str_extract(df$Ball, regex(paste(colors(), collapse="|"), ignore_case=T)) ,请使用上面的行:

df$Ball[1] = "Red ball"

答案 2 :(得分:1)

使用dplyrstringr软件包似乎是一个很好的案例:

library(stringr)
library(dplyr)

df <- structure(list(Ball = structure(c(5L, 3L, 2L, 4L, 1L, 3L), .Label = c("blue", "blue is my favourite", "red", "red ", "red ball"), class = "factor"), size = c(1.2, 2, 3, 10, 12, 100)), .Names = c("Ball", "size"), class = "data.frame", row.names = c(NA, -6L))


df %>%
  mutate(
    color = str_extract(`Ball`, "(red)|(blue)"),
    size_category = case_when(
      size < 10 ~ "small",
      size >= 10 & size < 100 ~ "medium",
      size >= 100 ~ "large"
    ),
    category = str_c(size_category, color, "ball", sep = " ")
  )