根据完美匹配对R中的条目求和?

时间:2014-08-17 18:18:39

标签: r matching

我有一张excel电子表格,其中一些研究人员抓鱼,然后将每个条目记录为一条鱼。因此,许多信息都是重复的。我想在R中使用一些条目匹配来改变电子表格的外观,但我不确定如何?

例如,现在我的电子表格如下:

Year  Location  TimeStarted  TimeEnded  Species
1974  H11       11:00 AM     12:30 PM   Black Rockfish
1974  H11       11:00 AM     12:30 PM   Black Rockfish
1974  H11       11:00 AM     12:30 PM   Black Rockfish
1974  H11       2:00 AM      3:30 AM    Copper Rockfish
1974  N80       11:00 AM     1:20 PM    Copper Rockfish 

我想看起来像:

Year  Location  TimeStarted  TimeEnded  Black RF  Copper RF
1974  H11       11:00 AM     12:30 PM   3         0
1974  H11       2:00 AM      3:30 AM    0         1
1974  N80       11:00 AM     1:20 PM    0         1

所以从本质上讲,我需要1.)条目完美匹配,然后如果他们这样做,2。)有R和物种数量,以完美匹配条目。

2 个答案:

答案 0 :(得分:2)

如果df是数据集,您可以尝试:

 library(reshape2)
 dcast(df, ...~Species, value.var="Species", length)
 #     Year Location TimeStarted TimeEnded Black Rockfish Copper Rockfish
 #1 1974      H11    11:00 AM  12:30 PM              3               0
 #2 1974      H11     2:00 AM   3:30 AM              0               1
 #3 1974      N80    11:00 AM   1:20 PM              0               1

或使用dplyr

 library(dplyr)
 library(tidyr)
  df%>%
  group_by(Year, Location, TimeStarted, TimeEnded, Species)%>%
  tally() %>%
  spread(Species, n, fill=0)
  #  Year Location TimeStarted TimeEnded Black Rockfish Copper Rockfish
  #1 1974      H11    11:00 AM  12:30 PM              3               0
  #2 1974      H11     2:00 AM   3:30 AM              0               1
  #3 1974      N80    11:00 AM   1:20 PM              0               1

答案 1 :(得分:2)

您还可以使用aggregate

的公式方法
> aggregate(Species ~ ., dat, summary)
#   Year Location TimeStarted TimeEnded Species.BlackRockfish Species.CopperRockfish
# 1 1974      N80     11:00AM    1:20PM                     0                      1
# 2 1974      H11     11:00AM   12:30PM                     3                      0
# 3 1974      H11      2:00AM    3:30AM                     0                      1

其中dat

dat <- 
structure(list(Year = c(1974L, 1974L, 1974L, 1974L, 1974L), Location = structure(c(1L, 
1L, 1L, 1L, 2L), .Label = c("H11", "N80"), class = "factor"), 
    TimeStarted = structure(c(1L, 1L, 1L, 2L, 1L), .Label = c("11:00 AM", 
    "2:00 AM"), class = "factor"), TimeEnded = structure(c(2L, 
    2L, 2L, 3L, 1L), .Label = c("1:20 PM", "12:30 PM", "3:30 AM"
    ), class = "factor"), Species = structure(c(1L, 1L, 1L, 2L, 
    2L), .Label = c("Black Rockfish", "Copper Rockfish"), class = "factor")), .Names = c("Year", 
"Location", "TimeStarted", "TimeEnded", "Species"), class = "data.frame", row.names = c(NA, 
-5L))