下面的示例数据...
我想统计每月计算的“类型”点的数量(类型是运输船)。因此,首先,我想总结一下每个月总计计有多少“类型”船只。例如六月有5个渔船点数。
最好使用dplyr:
我有类似的东西:
dfsum <- df %>% group_by(Month, Type) %>% tally()
但是,尽管效果很好,但我还要通过唯一的船舶ID来做上述工作-船舶每月可以拥有多个点,但是我想知道每月有多少艘独特的船舶。
我可以按ID添加组:
dfsum2 <- df %>% group_by(Month, id,Type) %>% tally()
但是,这不太整洁,使用较大的数据集将很难编译-而是我希望结果是2月份有2个唯一的渔船(使用此数据示例)-是否有更好的方法来提取此数据信息吗?
所需的输出:
Month Type n
Jan Fishing x
Feb Fishing x
Feb Sailing x
March Fishing x
其中x是该月该类别中按ID排列的唯一船只的数量或计数。
#虚拟数据
df<- structure(list(UTC_Time = structure(c(1L, 1L, 1L, 1L, 339L, 339L,
339L, 68L, 68L, 68L, 154L, 154L, 154L, 154L, 154L, 154L, 14L,
14L, 14L, 14L, 14L, 15L, 50L, 50L, 51L, 51L, 51L, 51L, 51L, 51L,
51L, 51L, 51L, 77L, 146L, 147L, 147L, 147L, 147L, 147L, 148L,
148L), .Label = c("2018-01-01 0:00:00", "2018-01-02 0:00:00",
"2018-01-03 0:00:00", "2018-01-04 0:00:00", "2018-01-05 0:00:00",
"2018-01-06 0:00:00", "2018-01-07 0:00:00", "2018-01-08 0:00:00",
"2018-01-09 0:00:00", "2018-01-10 0:00:00", "2018-01-11 0:00:00",
"2018-01-12 0:00:00", "2018-01-13 0:00:00", "2018-01-14 0:00:00",
"2018-01-15 0:00:00", "2018-01-16 0:00:00", "2018-01-17 0:00:00",
"2018-01-18 0:00:00", "2018-01-19 0:00:00", "2018-01-20 0:00:00",
"2018-01-21 0:00:00", "2018-01-22 0:00:00", "2018-01-23 0:00:00",
"2018-01-24 0:00:00", "2018-01-25 0:00:00", "2018-01-26 0:00:00",
"2018-01-27 0:00:00", "2018-01-28 0:00:00", "2018-01-29 0:00:00",
"2018-01-30 0:00:00", "2018-01-31 0:00:00", "2018-02-01 0:00:00",
"2018-02-02 0:00:00", "2018-02-03 0:00:00", "2018-02-04 0:00:00",
"2018-02-05 0:00:00", "2018-02-06 0:00:00", "2018-02-07 0:00:00",
"2018-02-08 0:00:00", "2018-02-09 0:00:00", "2018-02-10 0:00:00",
"2018-02-11 0:00:00", "2018-02-12 0:00:00", "2018-02-13 0:00:00",
"2018-02-14 0:00:00", "2018-02-15 0:00:00", "2018-02-16 0:00:00",
"2018-02-17 0:00:00", "2018-02-18 0:00:00", "2018-02-19 0:00:00",
"2018-02-20 0:00:00", "2018-02-21 0:00:00", "2018-02-22 0:00:00",
"2018-02-23 0:00:00", "2018-02-24 0:00:00", "2018-02-25 0:00:00",
"2018-02-26 0:00:00", "2018-02-27 0:00:00", "2018-02-28 0:00:00",
"2018-03-01 0:00:00", "2018-03-02 0:00:00", "2018-03-03 0:00:00",
"2018-03-04 0:00:00", "2018-03-05 0:00:00", "2018-03-06 0:00:00",
"2018-03-07 0:00:00", "2018-03-08 0:00:00", "2018-03-09 0:00:00",
"2018-03-10 0:00:00", "2018-03-11 0:00:00", "2018-03-12 0:00:00",
"2018-03-13 0:00:00", "2018-03-14 0:00:00", "2018-03-15 0:00:00",
"2018-03-16 0:00:00", "2018-03-17 0:00:00", "2018-03-18 0:00:00",
"2018-03-19 0:00:00", "2018-03-20 0:00:00", "2018-03-21 0:00:00",
"2018-03-22 0:00:00", "2018-03-23 0:00:00", "2018-03-24 0:00:00",
"2018-03-25 0:00:00", "2018-03-26 0:00:00", "2018-03-27 0:00:00",
"2018-03-28 0:00:00", "2018-03-29 0:00:00", "2018-03-30 0:00:00",
"2018-03-31 0:00:00", "2018-04-01 0:00:00", "2018-04-02 0:00:00",
"2018-04-03 0:00:00", "2018-04-04 0:00:00", "2018-04-05 0:00:00",
"2018-04-06 0:00:00", "2018-04-07 0:00:00", "2018-04-08 0:00:00",
"2018-04-09 0:00:00", "2018-04-10 0:00:00", "2018-04-11 0:00:00",
"2018-04-12 0:00:00", "2018-04-13 0:00:00", "2018-04-14 0:00:00",
"2018-04-15 0:00:00", "2018-04-16 0:00:00", "2018-04-17 0:00:00",
"2018-04-18 0:00:00", "2018-04-19 0:00:00", "2018-04-20 0:00:00",
"2018-04-21 0:00:00", "2018-04-22 0:00:00", "2018-04-23 0:00:00",
"2018-04-24 0:00:00", "2018-04-25 0:00:00", "2018-04-26 0:00:00",
"2018-04-27 0:00:00", "2018-04-28 0:00:00", "2018-04-29 0:00:00",
"2018-04-30 0:00:00", "2018-05-01 0:00:00", "2018-05-02 0:00:00",
"2018-05-03 0:00:00", "2018-05-04 0:00:00", "2018-05-05 0:00:00",
"2018-05-06 0:00:00", "2018-05-07 0:00:00", "2018-05-08 0:00:00",
"2018-05-09 0:00:00", "2018-05-10 0:00:00", "2018-05-11 0:00:00",
"2018-05-12 0:00:00", "2018-05-13 0:00:00", "2018-05-14 0:00:00",
"2018-05-15 0:00:00", "2018-05-16 0:00:00", "2018-05-17 0:00:00",
"2018-05-18 0:00:00", "2018-05-19 0:00:00", "2018-05-20 0:00:00",
"2018-05-21 0:00:00", "2018-05-22 0:00:00", "2018-05-23 0:00:00",
"2018-05-24 0:00:00", "2018-05-25 0:00:00", "2018-05-26 0:00:00",
"2018-05-27 0:00:00", "2018-05-28 0:00:00", "2018-05-29 0:00:00",
"2018-05-30 0:00:00", "2018-05-31 0:00:00", "2018-06-01 0:00:00",
"2018-06-02 0:00:00", "2018-06-03 0:00:00", "2018-06-04 0:00:00",
"2018-06-05 0:00:00", "2018-06-06 0:00:00", "2018-06-07 0:00:00",
"2018-06-08 0:00:00", "2018-06-09 0:00:00", "2018-06-10 0:00:00",
"2018-06-11 0:00:00", "2018-06-12 0:00:00", "2018-06-13 0:00:00",
"2018-06-14 0:00:00", "2018-06-15 0:00:00", "2018-06-16 0:00:00",
"2018-06-17 0:00:00", "2018-06-18 0:00:00", "2018-06-19 0:00:00",
"2018-06-20 0:00:00", "2018-06-21 0:00:00", "2018-06-22 0:00:00",
"2018-06-23 0:00:00", "2018-06-24 0:00:00", "2018-06-25 0:00:00",
"2018-06-26 0:00:00", "2018-06-27 0:00:00", "2018-06-28 0:00:00",
"2018-06-29 0:00:00", "2018-06-30 0:00:00", "2018-07-01 0:00:00",
"2018-07-02 0:00:00", "2018-07-03 0:00:00", "2018-07-04 0:00:00",
"2018-07-05 0:00:00", "2018-07-06 0:00:00", "2018-07-07 0:00:00",
"2018-07-08 0:00:00", "2018-07-09 0:00:00", "2018-07-10 0:00:00",
"2018-07-11 0:00:00", "2018-07-12 0:00:00", "2018-07-13 0:00:00",
"2018-07-14 0:00:00", "2018-07-15 0:00:00", "2018-07-16 0:00:00",
"2018-07-17 0:00:00", "2018-07-18 0:00:00", "2018-07-19 0:00:00",
"2018-07-20 0:00:00", "2018-07-21 0:00:00", "2018-07-22 0:00:00",
"2018-07-23 0:00:00", "2018-07-24 0:00:00", "2018-07-25 0:00:00",
"2018-07-26 0:00:00", "2018-07-27 0:00:00", "2018-07-28 0:00:00",
"2018-07-29 0:00:00", "2018-07-30 0:00:00", "2018-07-31 0:00:00",
"2018-08-01 0:00:00", "2018-08-02 0:00:00", "2018-08-03 0:00:00",
"2018-08-04 0:00:00", "2018-08-05 0:00:00", "2018-08-06 0:00:00",
"2018-08-07 0:00:00", "2018-08-08 0:00:00", "2018-08-09 0:00:00",
"2018-08-10 0:00:00", "2018-08-11 0:00:00", "2018-08-12 0:00:00",
"2018-08-13 0:00:00", "2018-08-14 0:00:00", "2018-08-15 0:00:00",
"2018-08-16 0:00:00", "2018-08-17 0:00:00", "2018-08-18 0:00:00",
"2018-08-19 0:00:00", "2018-08-20 0:00:00", "2018-08-21 0:00:00",
"2018-08-22 0:00:00", "2018-08-23 0:00:00", "2018-08-24 0:00:00",
"2018-08-25 0:00:00", "2018-08-26 0:00:00", "2018-08-27 0:00:00",
"2018-08-28 0:00:00", "2018-08-29 0:00:00", "2018-08-30 0:00:00",
"2018-08-31 0:00:00", "2018-09-01 0:00:00", "2018-09-02 0:00:00",
"2018-09-03 0:00:00", "2018-09-04 0:00:00", "2018-09-05 0:00:00",
"2018-09-06 0:00:00", "2018-09-07 0:00:00", "2018-09-08 0:00:00",
"2018-09-09 0:00:00", "2018-09-10 0:00:00", "2018-09-11 0:00:00",
"2018-09-12 0:00:00", "2018-09-13 0:00:00", "2018-09-14 0:00:00",
"2018-09-15 0:00:00", "2018-09-16 0:00:00", "2018-09-17 0:00:00",
"2018-09-18 0:00:00", "2018-09-19 0:00:00", "2018-09-20 0:00:00",
"2018-09-21 0:00:00", "2018-09-22 0:00:00", "2018-09-23 0:00:00",
"2018-09-24 0:00:00", "2018-09-25 0:00:00", "2018-09-26 0:00:00",
"2018-09-27 0:00:00", "2018-09-28 0:00:00", "2018-09-29 0:00:00",
"2018-09-30 0:00:00", "2018-10-01 0:00:00", "2018-10-02 0:00:00",
"2018-10-03 0:00:00", "2018-10-04 0:00:00", "2018-10-05 0:00:00",
"2018-10-06 0:00:00", "2018-10-07 0:00:00", "2018-10-08 0:00:00",
"2018-10-09 0:00:00", "2018-10-10 0:00:00", "2018-10-11 0:00:00",
"2018-10-12 0:00:00", "2018-10-13 0:00:00", "2018-10-14 0:00:00",
"2018-10-15 0:00:00", "2018-10-16 0:00:00", "2018-10-17 0:00:00",
"2018-10-18 0:00:00", "2018-10-19 0:00:00", "2018-10-20 0:00:00",
"2018-10-21 0:00:00", "2018-10-22 0:00:00", "2018-10-23 0:00:00",
"2018-10-24 0:00:00", "2018-10-25 0:00:00", "2018-10-26 0:00:00",
"2018-10-27 0:00:00", "2018-10-28 0:00:00", "2018-10-29 0:00:00",
"2018-10-30 0:00:00", "2018-10-31 0:00:00", "2018-11-01 0:00:00",
"2018-11-02 0:00:00", "2018-11-03 0:00:00", "2018-11-04 0:00:00",
"2018-11-05 0:00:00", "2018-11-06 0:00:00", "2018-11-07 0:00:00",
"2018-11-08 0:00:00", "2018-11-09 0:00:00", "2018-11-10 0:00:00",
"2018-11-11 0:00:00", "2018-11-12 0:00:00", "2018-11-13 0:00:00",
"2018-11-14 0:00:00", "2018-11-15 0:00:00", "2018-11-16 0:00:00",
"2018-11-17 0:00:00", "2018-11-18 0:00:00", "2018-11-19 0:00:00",
"2018-11-20 0:00:00", "2018-11-21 0:00:00", "2018-11-22 0:00:00",
"2018-11-23 0:00:00", "2018-11-24 0:00:00", "2018-11-25 0:00:00",
"2018-11-26 0:00:00", "2018-11-27 0:00:00", "2018-11-28 0:00:00",
"2018-11-29 0:00:00", "2018-11-30 0:00:00", "2018-12-01 0:00:00",
"2018-12-02 0:00:00", "2018-12-03 0:00:00", "2018-12-04 0:00:00",
"2018-12-05 0:00:00", "2018-12-06 0:00:00", "2018-12-07 0:00:00",
"2018-12-08 0:00:00", "2018-12-09 0:00:00", "2018-12-10 0:00:00",
"2018-12-11 0:00:00", "2018-12-12 0:00:00", "2018-12-13 0:00:00",
"2018-12-14 0:00:00", "2018-12-15 0:00:00", "2018-12-16 0:00:00",
"2018-12-17 0:00:00", "2018-12-18 0:00:00", "2018-12-19 0:00:00",
"2018-12-20 0:00:00", "2018-12-21 0:00:00", "2018-12-22 0:00:00",
"2018-12-23 0:00:00", "2018-12-24 0:00:00", "2018-12-25 0:00:00",
"2018-12-26 0:00:00", "2018-12-27 0:00:00", "2018-12-28 0:00:00",
"2018-12-29 0:00:00", "2018-12-30 0:00:00", "2018-12-31 0:00:00",
"2019-01-01 0:00:00"), class = "factor"), Type = structure(c(4L,
4L, 4L, 4L, 4L, 4L, 4L, 17L, 17L, 17L, 4L, 12L, 4L, 4L, 4L, 4L,
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
4L, 17L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L), .Label = c("Cargo ship",
"Cargo ship:DG,HS,MP(OS)", "Cargo ship:DG,HS,MP(X)", "Fishing",
"Law enforcement", "Local ship", "Passenger ship", "Passenger ship:DG,HS,MP(OS)",
"Passenger ship:DG,HS,MP(Y)", "Pilot", "Pleasure Craft", "Sailing",
"Search/rescue", "Ship", "Towing", "Towing(200/25)", "Tug"), class = "factor"),
Month = structure(c(5L, 5L, 5L, 5L, 3L, 3L, 3L, 8L, 8L, 8L,
7L, 7L, 7L, 7L, 7L, 7L, 5L, 5L, 5L, 5L, 5L, 5L, 4L, 4L, 4L,
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 8L, 9L, 9L, 9L, 9L, 9L, 9L,
9L, 9L), .Label = c("Apr", "Aug", "Dec", "Feb", "Jan", "Jul",
"Jun", "Mar", "May", "Nov", "Oct", "Sep"), class = "factor"),
id = c(27L, 27L, 27L, 27L, 21L, 21L, 21L, 24L, 24L, 24L,
20L, 6L, 20L, 20L, 20L, 20L, 48L, 48L, 48L, 48L, 48L, 42L,
34L, 34L, 31L, 31L, 31L, 31L, 31L, 31L, 31L, 31L, 31L, 23L,
17L, 17L, 17L, 14L, 14L, 3L, 14L, 3L)), row.names = c(1L,
2L, 3L, 4L, 650L, 651L, 652L, 262L, 263L, 264L, 400L, 401L, 402L,
403L, 404L, 405L, 100L, 101L, 102L, 103L, 104L, 105L, 250L, 251L,
252L, 253L, 254L, 255L, 256L, 257L, 258L, 259L, 260L, 300L, 301L,
302L, 303L, 304L, 305L, 306L, 307L, 308L), class = "data.frame")
答案 0 :(得分:2)
接下来可以使用base R
方法(有时可能很快):
#Code
result <- aggregate(Type~Month,df,function(x) length(unique(x)))
输出:
Month Type
1 Dec 1
2 Feb 1
3 Jan 1
4 Jun 2
5 Mar 1
6 May 1
或者也许:
#Code 2
result2 <- aggregate(id~Month,df,function(x) length(unique(x)))
输出:
Month id
1 Dec 1
2 Feb 2
3 Jan 3
4 Jun 2
5 Mar 2
6 May 3
根据预期的输出,您可以尝试以下操作:
#Code
new <- aggregate(id~Month+Type,data=df,function(x) length(unique(x)))
输出:
Month Type id
1 Dec Fishing 1
2 Feb Fishing 2
3 Jan Fishing 3
4 Jun Fishing 1
5 May Passenger ship 3
6 Jun Sailing 1
7 Mar Tug 2
或使用dplyr
:
library(dplyr)
#Code
new <- df %>% group_by(Month,Type) %>% summarise(N=length(unique(id)))
输出:
# A tibble: 7 x 3
# Groups: Month [6]
Month Type N
<fct> <fct> <int>
1 Dec Fishing 1
2 Feb Fishing 2
3 Jan Fishing 3
4 Jun Fishing 1
5 Jun Sailing 1
6 Mar Tug 2
7 May Passenger ship 3
答案 1 :(得分:1)
我们可以使用n_distinct
来按“月”查找唯一的“类型”数
library(dplyr)
df %>%
group_by(Month) %>%
summarise(n = n_distinct(Type))
-输出
# A tibble: 6 x 2
# Month n
# <fct> <int>
#1 Dec 1
#2 Feb 1
#3 Jan 1
#4 Jun 2
#5 Mar 1
#6 May 1
如果它基于“ id”
df %>%
group_by(Month) %>%
summarise(n = n_distinct(id))
-输出
# A tibble: 6 x 2
# Month n
# <fct> <int>
#1 Dec 1
#2 Feb 2
#3 Jan 3
#4 Jun 2
#5 Mar 2
#6 May 3
或者另一种选择是获取distinct
行并使用count
df %>%
distinct(Month, Type) %>%
count(Month)
或与data.table
library(data.table)
setDT(df)[, .(n = uniqueN(Type)), Month]
或与base R
aggregate(Type ~ Month, unique(df[c('Type', 'Month')]), length)
aggregate(id ~ Month, unique(df[c('id', 'Month')]), length)
关于base R
,特别是aggregate
的效率,它会慢到here