在R中,在具有不同长度的数据帧中拆分列

时间:2016-10-09 20:06:57

标签: r apply sapply strsplit

我正在尝试在数据框中拆分列奖,但拆分后的列会返回不同数量的结果,如何将其绑定回原始数据框:

样本DF:

        Name   Value     Awards
1       A1      NA      3 wins.
2       A2      1000    NA
3       A3      NA      2 wins.
4       A4      1999    1 win
5       A5      8178569 5 wins & 4 nominations.

预期结果:

        Name   Value     Awards                 AwardsNum  Cat
1       A1      NA      3 wins.                 3          A
2       A2      1000    NA                      NA         NA
3       A3      NA      2 wins.                 2          A
4       A4      1999    1 win                   1          A
5       A5      8178569 5 wins & 4 nominations. 9          C

所以基本上我需要在获胜和提名之前拆分奖项和每个数字我需要添加一个函数来总结它们,然后根据函数的结果和一系列值提供一个类别(Cat)< / p>

我有以下内容:

  strsplit(DF$Awards," ")
  cbind(DF,strsplit(DF$Awards," ") 

Error in data.frame(c("3", "wins."), "N/A", c("2", "wins."), c("1", "win." : 
arguments imply differing number of rows: 2, 1, 5

更新:     类别&lt; --- NA,没有奖项和提名 - A.                &lt; ---在1至5类B之间                &lt; - else C

I need to play around between B and C since I need to make sure that they are not more than 5:1 ratio between B and C

2 个答案:

答案 0 :(得分:0)

解决方案是使用正则表达式匹配所有数字。然后你可以对它们求和并指定类别。

library(stringr)

df_new <- sapply(DF$Awards, function(x){
    # get all numbers
    nums <- unlist(str_match_all(x, "[0-9]+"))
    # calculate sum
    AwardsNum <- sum(as.numeric(nums))
    # assign category basing on sum
    if (is.na(AwardsNum)){
        Cat <- NA
    }else if(AwardsNum == 0){
        Cat <- "A"
    }else if(AwardsNum < 5){
        Cat <- "B"
    }else{
        Cat <- "C"
    }
    return(c(AwardsNum, Cat))
})

# create new rows in df
DF$AwardsNum <- as.numeric(df_new[1, ])
DF$Cat <- df_new[2, ]

答案 1 :(得分:0)

我刚刚意识到@Istrel在我处理这个问题时已经发布了答案。我会发布我的帖子,因为它有点不同。

df <- data.frame(
    Name = c("A1", "A2", "A3", "A4", "A5"),
    Value = c(NA, 1000, NA, 1999, 8178569),
    Awards = c("3 wins", NA, "2 wins", "1 win", "5 wins & 4 nomiations")
)

library(magrittr)
n.awards <- sapply(df$Awards, function(x){
    ifelse(is.na(x), 0,{
        x %>% as.character %>%
            strsplit("[^0-9]+") %>%
            unlist %>%
            as.numeric %>%
            sum
    })
})
brks <- c(-0.1,0.9,4.9, 100)
cc <- cut(n.awards,brks)
cat <- c("A", "B", "C")
df.final <- cbind(df, AwardsNum = n.awards, Cat = cat[cc])

使用剪切,您可以在不使用多个if语句的情况下对矢量进行分组。