我有一张excel电子表格,其中一些研究人员抓鱼,然后将每个条目记录为一条鱼。因此,许多信息都是重复的。我想在R中使用一些条目匹配来改变电子表格的外观,但我不确定如何?
例如,现在我的电子表格如下:
Year Location TimeStarted TimeEnded Species
1974 H11 11:00 AM 12:30 PM Black Rockfish
1974 H11 11:00 AM 12:30 PM Black Rockfish
1974 H11 11:00 AM 12:30 PM Black Rockfish
1974 H11 2:00 AM 3:30 AM Copper Rockfish
1974 N80 11:00 AM 1:20 PM Copper Rockfish
我想看起来像:
Year Location TimeStarted TimeEnded Black RF Copper RF
1974 H11 11:00 AM 12:30 PM 3 0
1974 H11 2:00 AM 3:30 AM 0 1
1974 N80 11:00 AM 1:20 PM 0 1
所以从本质上讲,我需要1.)条目完美匹配,然后如果他们这样做,2。)有R和物种数量,以完美匹配条目。
答案 0 :(得分:2)
如果df
是数据集,您可以尝试:
library(reshape2)
dcast(df, ...~Species, value.var="Species", length)
# Year Location TimeStarted TimeEnded Black Rockfish Copper Rockfish
#1 1974 H11 11:00 AM 12:30 PM 3 0
#2 1974 H11 2:00 AM 3:30 AM 0 1
#3 1974 N80 11:00 AM 1:20 PM 0 1
或使用dplyr
library(dplyr)
library(tidyr)
df%>%
group_by(Year, Location, TimeStarted, TimeEnded, Species)%>%
tally() %>%
spread(Species, n, fill=0)
# Year Location TimeStarted TimeEnded Black Rockfish Copper Rockfish
#1 1974 H11 11:00 AM 12:30 PM 3 0
#2 1974 H11 2:00 AM 3:30 AM 0 1
#3 1974 N80 11:00 AM 1:20 PM 0 1
答案 1 :(得分:2)
您还可以使用aggregate
> aggregate(Species ~ ., dat, summary)
# Year Location TimeStarted TimeEnded Species.BlackRockfish Species.CopperRockfish
# 1 1974 N80 11:00AM 1:20PM 0 1
# 2 1974 H11 11:00AM 12:30PM 3 0
# 3 1974 H11 2:00AM 3:30AM 0 1
其中dat
是
dat <-
structure(list(Year = c(1974L, 1974L, 1974L, 1974L, 1974L), Location = structure(c(1L,
1L, 1L, 1L, 2L), .Label = c("H11", "N80"), class = "factor"),
TimeStarted = structure(c(1L, 1L, 1L, 2L, 1L), .Label = c("11:00 AM",
"2:00 AM"), class = "factor"), TimeEnded = structure(c(2L,
2L, 2L, 3L, 1L), .Label = c("1:20 PM", "12:30 PM", "3:30 AM"
), class = "factor"), Species = structure(c(1L, 1L, 1L, 2L,
2L), .Label = c("Black Rockfish", "Copper Rockfish"), class = "factor")), .Names = c("Year",
"Location", "TimeStarted", "TimeEnded", "Species"), class = "data.frame", row.names = c(NA,
-5L))