我想创建一个简单的例子。也许这么简单,但我不知道如何为它编写代码。
有一个面板数据集,其中包含两个变量date
和company
以及其他一些变量:
date <- c(1,1,1,1,1,2,2,2,2,3,3,4,4,4,5,5,5,6,6,6,6,6)
company <-c("a","b","c","d","e","a","b","c","d","a","b","a","b","c","a","b","c","a","b","c","d","e")
并非每家公司每天都有交易。所以我只想保持与已经交易过的公司相关的数据超过4次。在这个例子中,我有6天和5家公司。公司“e”和“d”应成为要删除的公司。
答案 0 :(得分:2)
一种选择是将dplyr::filter
与group_by
一起使用。 n()
提供group_by
项的行数。因此,n()
将在group_by
上应用company
后返回公司交易的次数。
#data
date <- c(1,1,1,1,1,2,2,2,2,3,3,4,4,4,5,5,5,6,6,6,6,6)
company <-c("a","b","c","d","e","a","b","c","d","a","b","a","b","c","a",
"b","c","a","b","c","d","e")
df <- data.frame(date, company)
library(dplyr)
df %>% group_by(company) %>%
filter(n() > 4) #subset companies traded for more than 4 times
#Result: e & d not appearing as for them count (n()) was less than 4
# # A tibble: 17 x 2
# # Groups: company [3]
# date company
# <dbl> <fctr>
# 1 1.00 a
# 2 1.00 b
# 3 1.00 c
# 4 2.00 a
# 5 2.00 b
# 6 2.00 c
# 7 3.00 a
# 8 3.00 b
# 9 4.00 a
# 10 4.00 b
# 11 4.00 c
# 12 5.00 a
# 13 5.00 b
# 14 5.00 c
# 15 6.00 a
# 16 6.00 b
# 17 6.00 c