我在R中有一个数据框,其中有很多重复的记录。我有兴趣找出该数据框中每个记录有多少条记录。
例如,我有以下数据框:
Fake Name Fake ID Fake Status Fake Program
June 0003 Green PR1
June 0003 Green PR1
Television 202 Blue PR3
Television 202 Green PR3
Television 202 Green PR3
CRT 12 Red PR0
从上面我想得到类似于下面的东西:
Fake Name Fake ID Fake Status Fake Program COUNT
June 0003 Green PR1 2
Television 202 Blue PR3 1
Television 202 Green PR3 2
CRT 12 Red PR0 1
任何帮助将不胜感激。谢谢。
答案 0 :(得分:10)
使用group_by_all
,然后用n
计算行数:
df %>% group_by_all() %>% summarise(COUNT = n())
# A tibble: 4 x 5
# Groups: Fake.Name, Fake.ID, Fake.Status [?]
# Fake.Name Fake.ID Fake.Status Fake.Program COUNT
# <fct> <int> <fct> <fct> <int>
#1 CRT 12 Red PR0 1
#2 June 3 Green PR1 2
#3 Television 202 Blue PR3 1
#4 Television 202 Green PR3 2
或者甚至比@Ryan的评论更好:
df %>% group_by_all %>% count
答案 1 :(得分:3)
以下使用duplicated
获取结果data.frame,然后使用rle
获取计数。
res <- dat[!duplicated(dat), ]
d <- duplicated(dat) | duplicated(dat, fromLast = TRUE)
res$COUNT <- rle(d)$lengths
res
# Fake Name Fake ID Fake Status Fake Program COUNT
#1 June 0003 Green PR1 2
#3 Television 202 Blue PR3 1
#4 Television 202 Green PR3 2
#6 CRT 12 Red PR0 1
答案 2 :(得分:2)
问题
如何计算数据框中的唯一行?
然后使用sum
和duplicated
。例如,
df <- data.frame(
`Fake Name` = c(
"June", "June", "Television", "Television", "Television", "CRT"),
`Fake ID` = c("0003", "0003", "202", "202", "202", "12"),
`Fake Status` = c("Green", "Green", "Blue", "Green", "Green", "Red"),
`Fake Program` = c("PR1", "PR1", "PR3", "PR3", "PR3", "PR0"),
check.names = FALSE)
df
#R Fake Name Fake ID Fake Status Fake Program
#R 1 June 0003 Green PR1
#R 2 June 0003 Green PR1
#R 3 Television 202 Blue PR3
#R 4 Television 202 Green PR3
#R 5 Television 202 Green PR3
#R 6 CRT 12 Red PR0
sum(!duplicated(df))
#R [1] 4
对于您请求的表,您可以如下使用data.table
library(data.table)
df <- data.table(df)
df[, .(COUNT = .N), by = names(df)]
#R Fake Name Fake ID Fake Status Fake Program COUNT
#R 1: June 0003 Green PR1 2
#R 2: Television 202 Blue PR3 1
#R 3: Television 202 Green PR3 2
#R 4: CRT 12 Red PR0 1
答案 3 :(得分:-1)
您可以使用:
n_distinct(data$col)