将“缺失”数字的数据帧转换为数字“命中”的数据帧

时间:2019-01-03 07:40:43

标签: r dataframe

我有一个非常具体的疑问,但是应该很容易解决,我只是想不到如何...

我有一个像这样的简单数据框:

mydf <- data.frame(Shooter=1:3, Targets.missed=c(paste(sample(1:10,4),collapse=";"), paste(sample(1:10,5),collapse=";"), paste(sample(1:10,8),collapse=";")))
mydf
  Shooter   Targets.missed
1       1          3;8;4;7
2       2       10;1;5;7;4
3       3 5;9;4;10;8;1;6;7

此数据帧告诉我每个Targets遗漏的Shooter(从1到10)。

我想获得一个不同的数据框,该数据框告诉Target是哪个Shooter

结果将是:

Target   hit.by.Shooters
1        1
2        1;2;3
3        2;3
4        NA
5        1
6        1;2
7        NA
8        2
9        1;2
10       1

3 个答案:

答案 0 :(得分:4)

我们通过以下方式扩展数据:将“ Targets.missed”的;拆分为“ long”格式,然后按“ Shooter”,summarise和数字list进行分组不在1:10的“ Targets.missed”中,unnest list列中,按“ Target”分组,summarise通过paste将{{1 }}将'Shooter'元素放入单个字符串中,并使用unique

NA填充1:10中缺少的元素
complete

或者另一种选择是library(tidyverse) mydf %>% separate_rows(Targets.missed) %>% group_by(Shooter) %>% summarise(Target = list(setdiff(1:10, Targets.missed))) %>% unnest %>% group_by(Target) %>% summarise(hit.by.Shooters = paste(unique(Shooter), collapse=";")) %>% complete(Target = 1:10) # A tibble: 10 x 2 # Target hit.by.Shooters # <int> <chr> # 1 1 1 # 2 2 1;2;3 # 3 3 2;3 # 4 4 <NA> # 5 5 1 # 6 6 1;2 # 7 7 <NA> # 8 8 2 # 9 9 1;2 #10 10 1 ,方法是将“ Targets.missed”(假设base R类)拆分为characterlist,然后遍历{{1 }},获取不在1:10中的值(使用vector),并使用“ Shooter”列设置list的名称,setdiff键/值{{ 1}}配对为两列data.frame,通过list将由“值”分组的“ ind”列stacklist一起获得unique行,aggregate来自1:10的完整“值”数据集

paste

并在必要时更改列名

merge

数据

out <-  aggregate(ind ~ values, 
  unique(stack(setNames(lapply(strsplit(mydf$Targets.missed, ';'), 
    setdiff, x= 1:10), mydf$Shooter))), FUN = paste, collapse=";")
out1 <- merge(data.frame(values = 1:10), out, all.x = TRUE)

答案 1 :(得分:1)

1. Here goes some text. A wonderful day. It's soon cristmas. #Tag not supported at line2 2. Happy 2019, soon. #{Some useful tag!:2} Something else goes here. 3. Happy ending. Yeppe! See you. 4. #Tag not supported at line5 5. #{begin:5} 6. Happy KKK! 7. Happy B-Day! #Tag not supported at line8 8. #{end:8} 9. 10. Universe is cool! 11. 12. . 13. #Tag not supported at line14 14. #{Slugish:14}. Here goes another line. #{Slugish:14} since this is a new sentence. 15. 16. endline. 方法

data.table

答案 2 :(得分:1)

另一种tidyverse可能性。我们首先使用ShooterTargets的所有可能组合创建数据框,然后使用mydf删除anti_join中存在的行,并通过以下方式填写缺少的Targets将它们添加为NA,最后通过Targets进行汇总,以获得Shooters实际击中目标的人。

library(tidyverse)

crossing(Shooter = unique(mydf$Shooter), Targets.missed = 1:10) %>%
anti_join(mydf %>% separate_rows(Targets.missed) %>% mutate_all(as.numeric)) %>%
        complete(Targets.missed = 1:10) %>%
        group_by(Targets.missed) %>%
        summarise(hit.by.Shooters = paste0(Shooter, collapse = ";"))


# Targets.missed hit.by.Shooters
#            <int> <chr>          
# 1              1 1;2            
# 2              2 1;2            
# 3              3 1              
# 4              4 1              
# 5              5 2              
# 6              6 1;3            
# 7              7 1;2            
# 8              8 2              
# 9              9 NA             
#10             10 3           

数据

set.seed(987)
mydf <- data.frame(Shooter=1:3, 
        Targets.missed=c(paste(sample(1:10,4),collapse=";"), 
        paste(sample(1:10,5),collapse=";"), paste(sample(1:10,8),collapse=";")))