想象一下,我有一个LARGE表(table1),看起来类似于下面的表。
表1:
YEAR MODEL MAKE ORDERCODE COLOR
2001 BMW 328i FAE GREEN
2001 BMW 328i SDC BLACK
2001 LEXUS LS430 ASD PURPLE
2001 LEXUS IS300 ASD BLACK
2001 LEXUS GS300h YUK BLACK
2001 LEXUS GS300h HNY BLUE
2002 LEXUS GS300h ASF PURPLE
2002 LEXUS GS300h FAS BROWN
2002 LEXUS GS300h YUI RED
2002 LEXUS IS250d ZXC ORANGE
2002 LEXUS IS250d ASE BLUE
我有另一个数据框有另一个make(让我们说它是一个Accord,有BLACK,BLUE,PURPLE和RED),所以它看起来像这样:
TABLE2:
MAKE COLOR
Accord BLACK
Accord RED
Accord BLUE
Accord PURPLE
我正在尝试查找TABLE1中具有table2中所有颜色的所有汽车。我尝试过使用dplyr table1 %>% filter(COLOR %in% table2$COLOR)
,但我会返回任何至少有一种我想要的给定颜色的行。我想返回具有我指定颜色的汽车MAKE的行。所以我的结果是这样的:
期望的结果:
2001 LEXUS GS300h YUK BLACK
2001 LEXUS GS300h HNY BLUE
2002 LEXUS GS300h ASF PURPLE
2002 LEXUS GS300h FAS BROWN
2002 LEXUS GS300h YUI RED
答案 0 :(得分:2)
为了找到最相似的,我们将找到每个模型的总颜色匹配,最后我们选择颜色匹配最高的模型。
<强> dplyr 强>
df %>%
group_by(MODEL,MAKE) %>%
mutate(slr = sum(df1$COLOR %in% COLOR)) %>%
filter(slr == max(slr))
<强> data.table 强>
setDT(df)
df[,slr := sum(df1$COLOR %in% COLOR),.(MODEL,MAKE)]
df = df[slr == max(slr)]
print(df)
输出结果为:
YEAR MODEL MAKE ORDERCODE COLOR slr
1: 2001 LEXUS GS300h YUK BLACK 4
2: 2001 LEXUS GS300h HNY BLUE 4
3: 2002 LEXUS GS300h ASF PURPLE 4
4: 2002 LEXUS GS300h FAS BROWN 4
5: 2002 LEXUS GS300h YUI RED 4
答案 1 :(得分:2)
以下是使用dplyr
:
library(dplyr)
df1 %>%
group_by(MODEL, MAKE) %>%
mutate(COLOR2 = ifelse(COLOR %in% df2$COLOR, COLOR, NA),
count = n_distinct(COLOR2[!is.na(COLOR2)])) %>%
filter(count == nrow(df2)) %>%
select(-COLOR2, -count)
<强>结果:强>
# A tibble: 5 x 5
# Groups: MODEL, MAKE [1]
YEAR MODEL MAKE ORDERCODE COLOR
<int> <chr> <chr> <chr> <chr>
1 2001 LEXUS GS300h YUK BLACK
2 2001 LEXUS GS300h HNY BLUE
3 2002 LEXUS GS300h ASF PURPLE
4 2002 LEXUS GS300h FAS BROWN
5 2002 LEXUS GS300h YUI RED
答案 2 :(得分:2)
(管道函数为%>%
而非%<%
。)需要根据品牌和型号在TABLE1中创建分组,并反转%in%
操作的“方向”在测试中,并添加逻辑all
操作。问题是 all 第二个表中的颜色是否存在于限制为单个分组的颜色中。
TABLE1 %>% group_by(MODEL, MAKE) %>% filter(all(TABLE2$COLOR %in% COLOR))
# A tibble: 5 x 5
# Groups: MODEL, MAKE [1]
YEAR MODEL MAKE ORDERCODE COLOR
<int> <chr> <chr> <chr> <chr>
1 2001 LEXUS GS300h YUK BLACK
2 2001 LEXUS GS300h HNY BLUE
3 2002 LEXUS GS300h ASF PURPLE
4 2002 LEXUS GS300h FAS BROWN
5 2002 LEXUS GS300h YUI RED