从R中提取数据并将其粘贴到具有不同格式

时间:2017-05-18 10:03:46

标签: r if-statement

我有一个很大的植物种类和几个位置,其中三列总是属于一个位置,在第一列中给出一个物种的存在(1 =存在,空=不存在),在第二列中物种被发现的高度,第三列有一个音符(见下文)。

Location         A          A         A         B            B        B
Index         Presence    Altitude    Note    Presence    Altitude    Note
Species A                                       1          2560    Something
Species B        1        3100       Some
Species C
Species D        1        2899       Some

现在我想提取在一个位置(Presence = 1)发生的所有物种,并按以下格式将它们发布到一个新的数据框中:

Location         Species         Altitude       Note
   A                B              3100         Some
   A                D              2899         Some
   B                A              2560         Something

我已经尝试了几件事,但没有任何工作甚至没有密切关注。我很欣赏每一个输入。谢谢。

已添加:我上传了一个数据外观here

的示例

1 个答案:

答案 0 :(得分:0)

假设您有以下格式的数据框:

dat <- structure(list(Loc_Index = c("Species A", "Species B", "Species C", 
 "Species D"), A.Presence = c("", "1", "", "1"), A.Altitude = c("", 
 "3100", "", "2899"), A.Note = c("", "Some", "", "Some"), B.Presence = c("1", 
 "", "", ""), B.Altitude = c("2560", "", "", ""), B.Note = c("Something", 
 "", "", "")), class = "data.frame", .Names = c("Species", "A.Presence", 
 "A.Altitude", "A.Note", "B.Presence", "B.Altitude", "B.Note"), row.names = c(NA, -4L))

> dat
#     Species A.Presence A.Altitude A.Note B.Presence B.Altitude    B.Note
# 1 Species A                                       1       2560 Something
# 2 Species B          1       3100   Some                                
# 3 Species C                                                             
# 4 Species D          1       2899   Some  

您可以使用tidyr收集和点差以及一些dplyr操作的组合来实现这一目标:

library(tidyr)
library(dplyr)
dat2 <- dat %>% tidyr::gather(key = key, value = value, A.Presence, 
                A.Altitude, A.Note, B.Presence, B.Altitude, B.Note) %>% 
        dplyr::mutate(Location = substr(key, 1, 1), 
                parameter = sub("^.*\\.", "", key),
                Species = sub("^.*\\s", "", Species)) %>%
        dplyr::select(Location, Species, parameter, value) %>%
        tidyr::spread(key = parameter, value = value) %>%
        dplyr::filter(Presence == 1) %>% select(-Presence)

> dat2
#   Location Species Altitude      Note
# 1        A       B     3100      Some
# 2        A       D     2899      Some
# 3        B       A     2560 Something