在R中按组和ID计数观察

时间:2019-07-25 14:43:57

标签: r dplyr tidyverse

我正在尝试编写一个基于条件对观察进行计数的代码。我不知道是否可能。我要实现的是仅将组中的一项观察计数,而不是将它们加在一起。

这是数据框:

df <- structure(list(ID = c("P40", "P40", "P40", "P40", "P42", "P42",
                     "P43", "P43", "P43"), Year = c("2013", "2013", "2014", "2015", "2013", "2014", "2014", "2014", "2014"),
              Meeting = c("Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes")),
         class = "data.frame", row.names = c(NA, -9L))



ID  Year Meeting
P40 2013     Yes
P40 2013     Yes
P40 2014     Yes
P40 2015     Yes
P42 2013     Yes
P42 2014     Yes
P43 2014     Yes
P43 2014     Yes
P43 2014     Yes

我想要实现的结果:

ID Year      Count
P40 2013     1
P40 2014     1
P40 2015     1
P42 2013     1
P42 2014     1
P43 2014     1

这是我到目前为止的代码,仅计算所有观察结果。

df %>% group_by(ID, Year) %>% summarise(Count = n())

3 个答案:

答案 0 :(得分:4)

你在后面吗?

DEBUG: Running [gcloud.functions.deploy] with arguments: [--log-http: "true", --region: "europe-west1", --runtime: "nodejs10", --trigger-http: "True", --verbosity: "debug", NAME: "hello"]
=======================
==== request start ====
uri: https://cloudfunctions.googleapis.com/v1/projects/expando-eve/locations/europe-west1/functions/hello?alt=json
method: GET
== headers start ==
Authorization: --- Token Redacted ---
accept: application/json
accept-encoding: gzip, deflate
content-length: 0
user-agent: google-cloud-sdk x_Tw5K8nnjoRAqULM9PFAC2b gcloud/254.0.0 command/gcloud.functions.deploy invocation-id/fe5896cf229244f39b51b573c1477967 environment/None environment-version/None interactive/True from-script/False python/2.7.16 term/xterm-256color (Linux 5.0.0-21-generic)
== headers end ==
== body start ==

== body end ==
==== request end ====
---- response start ----
-- headers start --
-content-encoding: gzip
alt-svc: quic=":443"; ma=2592000; v="46,43,39"
cache-control: private
content-length: 158
content-type: application/json; charset=UTF-8
date: Thu, 25 Jul 2019 14:35:39 GMT
server: ESF
status: 404
transfer-encoding: chunked
vary: Origin, X-Origin, Referer
x-content-type-options: nosniff
x-frame-options: SAMEORIGIN
x-xss-protection: 0
-- headers end --
-- body start --
{
  "error": {
    "code": 404,
    "message": "Function hello in region europe-west1 in project expando-eve does not exist",
    "status": "NOT_FOUND"
  }
}

-- body end --
total round trip time (request+response): 0.258 secs
---- response end ----
----------------------
INFO: Using ignore file at [./.gcloudignore].
DEBUG: Skipping file [./.gitignore]
DEBUG: Skipping file [./.gcloudignore]
DEBUG: Skipping file [./.idea/.gitignore]
INFO: Using ignore file at [./.gcloudignore].
DEBUG: Skipping file [.gitignore]
DEBUG: Skipping file [.gcloudignore]
DEBUG: Skipping file [.idea/.gitignore]
=======================
==== request start ====
uri: https://cloudfunctions.googleapis.com/v1/projects/expando-eve/locations/europe-west1/functions:generateUploadUrl?alt=json
method: POST
== headers start ==
Authorization: --- Token Redacted ---
accept: application/json
accept-encoding: gzip, deflate
content-length: 2
content-type: application/json
user-agent: google-cloud-sdk x_Tw5K8nnjoRAqULM9PFAC2b gcloud/254.0.0 command/gcloud.functions.deploy invocation-id/5d5a3c8af0f1441f99b9ad553e5cbbc2 environment/None environment-version/None interactive/True from-script/False python/2.7.16 term/xterm-256color (Linux 5.0.0-21-generic)
== headers end ==
== body start ==
{}
== body end ==
==== request end ====
---- response start ----
-- headers start --
-content-encoding: gzip
alt-svc: quic=":443"; ma=2592000; v="46,43,39"
cache-control: private
content-length: 122
content-type: application/json; charset=UTF-8
date: Thu, 25 Jul 2019 14:37:40 GMT
server: ESF
status: 503
transfer-encoding: chunked
vary: Origin, X-Origin, Referer
x-content-type-options: nosniff
x-frame-options: SAMEORIGIN
x-xss-protection: 0
-- headers end --
-- body start --
{
  "error": {
    "code": 503,
    "message": "The service is currently unavailable.",
    "status": "UNAVAILABLE"
  }
}

-- body end --
total round trip time (request+response): 120.153 secs
---- response end ----
----------------------
DEBUG: Response returned status 503, retrying
DEBUG: Retrying request to url https://cloudfunctions.googleapis.com/v1/projects/expando-eve/locations/europe-west1/functions:generateUploadUrl?alt=json after exception HttpError accessing <https://cloudfunctions.googleapis.com/v1/projects/expando-eve/locations/europe-west1/functions:generateUploadUrl?alt=json>: response: <{'status': '503', 'content-length': '122', 'x-xss-protection': '0', 'x-content-type-options': 'nosniff', 'transfer-encoding': 'chunked', 'vary': 'Origin, X-Origin, Referer', 'server': 'ESF', '-content-encoding': 'gzip', 'cache-control': 'private', 'date': 'Thu, 25 Jul 2019 14:37:40 GMT', 'x-frame-options': 'SAMEORIGIN', 'alt-svc': 'quic=":443"; ma=2592000; v="46,43,39"', 'content-type': 'application/json; charset=UTF-8'}>, content <{
  "error": {
    "code": 503,
    "message": "The service is currently unavailable.",
    "status": "UNAVAILABLE"
  }
}

输出:

count(df %>% distinct(ID, Year), ID, Year, name = 'Count')

答案 1 :(得分:3)

我们可以对数据集进行distinct,然后使用count

library(dplyr)
df %>% 
   distinct %>% 
   count(ID, Year)
# A tibble: 6 x 3
#  ID    Year      n
#  <chr> <chr> <int>
#1 P40   2013      1
#2 P40   2014      1
#3 P40   2015      1
#4 P42   2013      1
#5 P42   2014      1
#6 P43   2014      1

或使用data.table

library(data.table)
unique(setDT(df[1:2]))[, .N, .(ID, Year)]

或使用base R

subset(as.data.frame(table(unique(df[1:2]))), Freq != 0)

或带有cbind的选项

cbind(unique(df[1:2]), n = 1)

答案 2 :(得分:0)

由于您只想在每个组中进行一次观察,所以不是吗

transform(unique(df), count = 1)

#   ID Year Meeting count
#1 P40 2013     Yes     1
#3 P40 2014     Yes     1
#4 P40 2015     Yes     1
#5 P42 2013     Yes     1
#6 P42 2014     Yes     1
#7 P43 2014     Yes     1

或者如果您只想检查选定的列

transform(unique(df[1:2]), count = 1)