R:整洁的数据格式转换为可读格式

时间:2019-04-09 02:35:02

标签: r dplyr

我拥有整齐的数据格式的物种数据。为了包含在报告中,我需要减小表格的宽度,方法是对每个组仅列出一次较高的顺序(王国,门,阶级等)。

我目前有:

enter image description here

...,需要进行如下操作: enter image description here

...或类似的东西

enter image description here

...每个高阶仅给出一次,该高阶内的每个物种都列在下面。

此列表很长,因此需要基于脚本。我已经看过使用dplyr的情况,但是看不到实现这一目标的方法。

如果需要,下面是可复制的示例数据。

exampledata <- structure(list(KINGDOM = c("Animalia", "Animalia", "Animalia", 
                                   "Animalia", "Animalia", "Animalia", "Animalia", "Animalia", "Animalia", 
                                   "Animalia", "Animalia", "Animalia"), PHYLYM = c("Chordata", "Chordata", 
                                                                                   "Chordata", "Chordata", "Chordata", "Chordata", "Chordata", "Chordata", 
                                                                                   "Chordata", "Chordata", "Chordata", "Chordata"), CLASS = c("Amphibia", 
                                                                                                                                              "Amphibia", "Amphibia", "Amphibia", "Amphibia", "Aves", "Aves", 
                                                                                                                                              "Aves", "Aves", "Aves", "Aves", "Aves"), ORDER = c("Anura", "Anura", 
                                                                                                                                                                                                 "Anura", "Anura", "Anura", "Accipitriformes", "Ciconiiformes", 
                                                                                                                                                                                                 "Gruiformes", "Passeriformes", "Passeriformes", "Pelecaniformes", 
                                                                                                                                                                                                 "Pelecaniformes"), FAMILY = c("Ranidae", "Ranidae", "Rhacophoridae", 
                                                                                                                                                                                                                               "Rhacophoridae", "Rhacophoridae", "Accipitridae", "Ciconiidae", 
                                                                                                                                                                                                                               "Gruidae", "Muscicapidae", "Muscicapidae", "Threskiornithidae", 
                                                                                                                                                                                                                               "Threskiornithidae"), SCIENTIFICNAME = c("Hylarana attigua", 
                                                                                                                                                                                                                                                                        "Hylarana taipehensis", "Philautus", "Polypedates leucomystax", 
                                                                                                                                                                                                                                                                        "Theloderma asperum", "Aviceda jerdoni", "Leptoptilos javanicus", 
                                                                                                                                                                                                                                                                        "Antigone antigone", "Cyanoptila cyanomelana", "Cyornis hainanus", 
                                                                                                                                                                                                                                                                        "Pseudibis davisoni", "Thaumatibis gigantea"), OTHERDATA = c("XYZ", 
                                                                                                                                                                                                                                                                                                                                     "ABC", "XYZ", "ABC", "XYZ", "XYZ", "ABC", "XYZ", "ABC", "ABC", 
                                                                                                                                                                                                                                                                                                                                     "XYZ", "XYZ")), row.names = c(NA, 12L), class = "data.frame")

3 个答案:

答案 0 :(得分:3)

删除数据通常不是一个好主意,但是我看到了用例。

只要您的数据顺序正确,就可以执行以下操作:

iris %>% 
  mutate(Species = if_else(duplicated(Species),"", as.character(Species)))

请注意,as.character()仅是必需的,因为Species是此数据集中的一个因素。


编辑示例数据:

exampledata %>% 
  mutate_at(vars("KINGDOM", "PHYLYM", "CLASS","ORDER", "FAMILY", "SCIENTIFICNAME"), ~ if_else(duplicated(.x),"", as.character(.x)) )

提供如下表格:

    KINGDOM   PHYLYM    CLASS           ORDER            FAMILY          SCIENTIFICNAME OTHERDATA
1  Animalia Chordata Amphibia           Anura           Ranidae        Hylarana attigua       XYZ
2                                                                  Hylarana taipehensis       ABC
3                                                 Rhacophoridae               Philautus       XYZ
4                                                               Polypedates leucomystax       ABC
5                                                                    Theloderma asperum       XYZ
6                        Aves Accipitriformes      Accipitridae         Aviceda jerdoni       XYZ
7                               Ciconiiformes        Ciconiidae   Leptoptilos javanicus       ABC
8                                  Gruiformes           Gruidae       Antigone antigone       XYZ
9                               Passeriformes      Muscicapidae  Cyanoptila cyanomelana       ABC
10                                                                     Cyornis hainanus       ABC
11                             Pelecaniformes Threskiornithidae      Pseudibis davisoni       XYZ
12                                                                 Thaumatibis gigantea       XYZ

答案 1 :(得分:2)

如果要减少数据,我建议不要group_by较高的顺序,而将其他详细信息另存为逗号分隔的字符串。

library(dplyr)

exampledata %>%
   group_by(KINGDOM, PHYLYM, CLASS, ORDER, FAMILY) %>%
   summarise_at(vars(SCIENTIFICNAME, OTHERDATA), toString)


#   KINGDOM  PHYLYM   CLASS    ORDER         FAMILY         SCIENTIFICNAME                                 OTHERDATA   
#  <chr>    <chr>    <chr>    <chr>         <chr>          <chr>                                          <chr>       
#1 Animalia Chordata Amphibia Anura         Ranidae        Hylarana attigua, Hylarana taipehensis         XYZ, ABC    
#2 Animalia Chordata Amphibia Anura         Rhacophoridae  Philautus, Polypedates leucomystax, Theloderm… XYZ, ABC, X…
#3 Animalia Chordata Aves     Accipitrifor… Accipitridae   Aviceda jerdoni                                XYZ         
#4 Animalia Chordata Aves     Ciconiiformes Ciconiidae     Leptoptilos javanicus                          ABC         
#5 Animalia Chordata Aves     Gruiformes    Gruidae        Antigone antigone                              XYZ         
#6 Animalia Chordata Aves     Passeriformes Muscicapidae   Cyanoptila cyanomelana, Cyornis hainanus       ABC, ABC    
#7 Animalia Chordata Aves     Pelecaniform… Threskiornith… Pseudibis davisoni, Thaumatibis gigantea       XYZ, XYZ

使用此方法,您不会丢失任何信息,也可以减少数据框中的行数。您可以根据自己的喜好在group_bysummarise_at中添加/删除列。

答案 2 :(得分:0)

尽管最初的问题是有关在R中执行此操作的,但我意识到在Excel中使用数据透视表更快,更简单,将每个较高的分类从最高到最低添加到行中,然后使用VLOOKUP添加所需的附加数据

enter image description here