使用dplyr来用条件填充一个df的内容来填充一个新的df

时间:2020-06-03 14:36:05

标签: r dataframe dplyr

您好,我需要帮助才能从此文件中创建一个nex df:

idx=df.index.get_indexer([3])[0]
df.iloc[idx:idx+5]
Out[9]: 
          Name  code
Id                  
3            c    65
4          c++    25
5         html    74
6          css    63
7   javascript    45

这个想法是让每个 Groups COL1 COL2 COL3 COL4 COL5 0 G1 DJJE):Canis_lupus ABFC Canidae 4 3 1 G1 JUUI):Canis_canis YH Canidae 10 12 2 G1 KI):Lupus_lupus ZA canidae 2 12 3 G2 IOZ):Felis_sylvestris OP Falidae 0 2 4 G2 KI):Felis_felis UI Falidae 6 8 5 G3 YY):Canis_lupus ER Raninae 7 9 6 G3 SD):Canis_lupus GH Raninae 2 3 7 G3 DZ:)Lupus_lupus EZ Raninae 6 8 8 G4 KUU):O_outan LO Babounae 4 8 9 G4 OK:)Felis_sylvestris IO Babounae 4 8 9 G4 LK:)Felis_sylvestris IU Babounae 8 9 都将一列Groups作为第一列创建一个df,然后通过添加新的列和字母来填充它。

这是一个例子:

COL3G1组成(在3 different Names模式之后): -:) -Canis_lupus -Canis_canis

然后如果Lupus_lupusCOL4的值都为COL5,则在新的df中分配值> 5 如果ACOL4,那么我在新df中分配值COL5 < 5

例如B都具有DJJE):Canis_lupus,则COL4 and COL5 < 5中的Canis_lupus将具有Canidae值。

B中的YY):Canis_lupus保留G3 h,然后COL4 and COL5 > 5中的Canis_lupus将具有Raninae

例如,如果同时存在ACOL4 and COL5 > 5的情况:

COL4 and COL5 < 5row5row6Canis_lupus,然后是A> B,所以我给字母A。

如果RaninaeCOL4 > 5,那么我给字母COL5 < 5

如果BCOL4 < 5,我给字母COL5 > 5 这是预期的输出:

B

这是数据:

COL3     Canis_lupus Canis_canis Lupus_lupus Felis_sylvestris O_outan 
Canidae  B           A           A           NA               NA   
Falidae  A           NA          NA          B                A    
Raninae  A           NA          A           NA               NA  
Babounae NA          NA          NA          A                B    

有人有想法吗?

也许一个想法是使用:

structure(list(Groups = structure(c(1L, 1L, 1L, 2L, 2L, 3L, 3L, 
3L, 4L, 4L, 4L), .Label = c("G1", "G2", "G3", "G4"), class = "factor"), 
    COL1 = structure(c(1L, 4L, 6L, 3L, 5L, 11L, 10L, 2L, 7L, 
    9L, 8L), .Label = c("DJJE):Canis_lupus", "DZ:)Lupus_lupus", 
    "IOZ):Felis_sylvestris", "JUUI):Canis_canis", "KI):Felis_felis", 
    "KI):Lupus_lupus", "KUU):O_outan", "LK:)Felis_sylvestris", 
    "OK:)Felis_sylvestris", "SD):Canis_lupus", "YY):Canis_lupus"
    ), class = "factor"), COL2 = structure(c(1L, 10L, 11L, 8L, 
    9L, 2L, 4L, 3L, 7L, 5L, 6L), .Label = c("ABFC", "ER", "EZ", 
    "GH", "IO", "IU", "LO", "OP", "UI", "YH", "ZA"), class = "factor"), 
    COL3 = structure(c(3L, 3L, 2L, 4L, 4L, 5L, 5L, 5L, 1L, 1L, 
    1L), .Label = c("Babounae", "canidae", "Canidae", "Falidae", 
    "Raninae"), class = "factor"), COL4 = c(4L, 10L, 2L, 0L, 
    6L, 7L, 2L, 6L, 4L, 4L, 8L), COL5 = c(3L, 12L, 12L, 2L, 8L, 
    9L, 3L, 8L, 8L, 8L, 9L)), class = "data.frame", row.names = c(NA, 
-11L))

,然后是case_when( any(COL4>=5 & COL5>= 5) ~ "A", any(COL4<5 & COL5>= 5) ~ "B", any(COL4>=5 & COL5< 5) ~ "B", any(COL4<5 & COL5< 5) ~ "B" 吗?

1 个答案:

答案 0 :(得分:2)

在名为tiyverse的数据集中使用tidyrstringrdplyrdf):

df %>%
  mutate(value = case_when(COL4>=5 & COL5>= 5 ~ "A",
                           COL4<5  & COL5>= 5 ~ "B",
                           COL4>=5 & COL5< 5  ~ "B",
                           COL4<5  & COL5< 5  ~ "B"),
         COL1 = str_extract(df$COL1, "(?<=\\):|:\\)).*"),
         COL3 = str_to_title(as.character(COL3))) %>%
  select(-c(Groups, COL2, COL4, COL5)) %>%
  group_by(COL3, COL1) %>% 
  arrange(value, .by_group=TRUE) %>%
  slice(1) %>%
  pivot_wider(names_from = "COL1", values_from="value")

这给出了:

# A tibble: 4 x 7
# Groups:   COL3 [4]
  COL3     Felis_sylvestris O_outan Canis_canis Canis_lupus Lupus_lupus Felis_felis
  <chr>    <chr>            <chr>   <chr>       <chr>       <chr>       <chr>      
1 Babounae A                B       NA          NA          NA          NA         
2 Canidae  NA               NA      A           B           B           NA         
3 Falidae  B                NA      NA          NA          NA          A          
4 Raninae  NA               NA      NA          A           A           NA  

注释:

  • 由于两者同时发生,我用COL1):隔开了:)
  • 解决方案似乎有点复杂。我敢打赌,有更简单的方法。