我有一个很大的植物种类和几个位置,其中三列总是属于一个位置,在第一列中给出一个物种的存在(1 =存在,空=不存在),在第二列中物种被发现的高度,第三列有一个音符(见下文)。
Location A A A B B B
Index Presence Altitude Note Presence Altitude Note
Species A 1 2560 Something
Species B 1 3100 Some
Species C
Species D 1 2899 Some
现在我想提取在一个位置(Presence = 1)发生的所有物种,并按以下格式将它们发布到一个新的数据框中:
Location Species Altitude Note
A B 3100 Some
A D 2899 Some
B A 2560 Something
我已经尝试了几件事,但没有任何工作甚至没有密切关注。我很欣赏每一个输入。谢谢。
已添加:我上传了一个数据外观here
的示例答案 0 :(得分:0)
假设您有以下格式的数据框:
dat <- structure(list(Loc_Index = c("Species A", "Species B", "Species C",
"Species D"), A.Presence = c("", "1", "", "1"), A.Altitude = c("",
"3100", "", "2899"), A.Note = c("", "Some", "", "Some"), B.Presence = c("1",
"", "", ""), B.Altitude = c("2560", "", "", ""), B.Note = c("Something",
"", "", "")), class = "data.frame", .Names = c("Species", "A.Presence",
"A.Altitude", "A.Note", "B.Presence", "B.Altitude", "B.Note"), row.names = c(NA, -4L))
> dat
# Species A.Presence A.Altitude A.Note B.Presence B.Altitude B.Note
# 1 Species A 1 2560 Something
# 2 Species B 1 3100 Some
# 3 Species C
# 4 Species D 1 2899 Some
您可以使用tidyr
收集和点差以及一些dplyr
操作的组合来实现这一目标:
library(tidyr)
library(dplyr)
dat2 <- dat %>% tidyr::gather(key = key, value = value, A.Presence,
A.Altitude, A.Note, B.Presence, B.Altitude, B.Note) %>%
dplyr::mutate(Location = substr(key, 1, 1),
parameter = sub("^.*\\.", "", key),
Species = sub("^.*\\s", "", Species)) %>%
dplyr::select(Location, Species, parameter, value) %>%
tidyr::spread(key = parameter, value = value) %>%
dplyr::filter(Presence == 1) %>% select(-Presence)
> dat2
# Location Species Altitude Note
# 1 A B 3100 Some
# 2 A D 2899 Some
# 3 B A 2560 Something