我有一个采用以下格式的数据框:
mpg cyl disp hp drat wt qsec vs am gear carb
1 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
2 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
3 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
4 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
5 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
6 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
7 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
8 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
15 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4
16 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4
我要选择所有重复项,条件是它们是 在mpg和carb中均重复。
这将导致以下结果:
mpg cyl disp hp drat wt qsec vs am gear carb
1 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
2 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
15 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4
16 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4
答案 0 :(得分:1)
一种dplyr
解决方案:
library(dplyr)
mtcars %>%
add_count(mpg, carb) %>% # count how many times the combinations of those variables exist and add those counts in a new column
filter(n > 1) %>% # keep only rows where the combination appears multiple times
select(-n) # remove counts
# # A tibble: 6 x 11
# mpg cyl disp hp drat wt qsec vs am gear carb
# <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 21 6 160 110 3.9 2.62 16.5 0 1 4 4
# 2 21 6 160 110 3.9 2.88 17.0 0 1 4 4
# 3 10.4 8 472 205 2.93 5.25 18.0 0 0 3 4
# 4 10.4 8 460 215 3 5.42 17.8 0 0 3 4
# 5 30.4 4 75.7 52 4.93 1.62 18.5 1 1 4 2
# 6 30.4 4 95.1 113 3.77 1.51 16.9 1 1 5 2
答案 1 :(得分:1)
这是另一个dplyr
选项:
library(dplyr)
mtcars %>%
group_by(mpg, carb) %>%
filter(n()>1)
答案 2 :(得分:0)
有了data.table
,我们可以做到
library(data.table)
as.data.table(mtcars)[, .SD[.N > 1], .(mpg, carb)]