对于两个示例数据帧:
df1 <- structure(list(name = c("Katie", "Eve", "James", "Alexander",
"Mary", "Barrie", "Harry", "Sam"), postcode = c("CB12FR", "CB12FR",
"NE34TR", "DH34RL", "PE46YH", "IL57DS", "IP43WR", "IL45TR")), .Names = c("name",
"postcode"), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
-8L), spec = structure(list(cols = structure(list(name = structure(list(), class = c("collector_character",
"collector")), postcode = structure(list(), class = c("collector_character",
"collector"))), .Names = c("name", "postcode")), default = structure(list(), class = c("collector_guess",
"collector"))), .Names = c("cols", "default"), class = "col_spec"))
df2 <-structure(list(name = c("Katie", "James", "Alexander", "Lucie",
"Mary", "Barrie", "Claire", "Harry", "Clare", "Hannah", "Rob",
"Eve", "Sarah"), postcode = c("CB12FR", "NE34TR", "DH34RL", "DL56TH",
"PE46YH", "IL57DS", "RE35TP", "IP43WQ", "BH35OP", "CB12FR", "DL56TH",
"CB12FR", "IL45TR"), rating = c(1L, 1L, 1L, 2L, 3L, 1L, 4L, 2L,
2L, 3L, 1L, 4L, 2L)), .Names = c("name", "postcode", "rating"
), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
-13L), spec = structure(list(cols = structure(list(name = structure(list(), class = c("collector_character",
"collector")), postcode = structure(list(), class = c("collector_character",
"collector")), rating = structure(list(), class = c("collector_integer",
"collector"))), .Names = c("name", "postcode", "rating")), default = structure(list(), class = c("collector_guess",
"collector"))), .Names = c("cols", "default"), class = "col_spec"))
我希望合并两个数据帧,因此df2上的等级被添加到df1。我通常会使用:
ratings.df&lt; - merge(x = df1,y = df2,by =“postcode”,all.x = TRUE)
无论其.... 我希望只在以下情况下合并: 1. df2中的邮政编码是唯一的(即,如果每个名称(或不同的名称)有多个邮政编码,则不会合并这些邮政编码)。 2.两个数据框中名称的前三个字母相同。
(我很高兴没有评级的邮政编码空白(我可以手动完成这些)。
这可能吗?
答案 0 :(得分:1)
为什么不使用sqldf
包裹?您可以使用此包合并R中的data.frames。通过使用JOIN
语句来执行此操作。
就条件合并而言,这可以通过在SQL中使用CASE
语句来实现。
因此,对于您的第一个条件,您可以使用CASE
和COUNT(postcode) = ‘1’
所在的GROUP BY name
,这样,对于分配了1个邮政编码的每个名称,您可以{ {1}}。
另一个选择是JOIN
使用gather
。
答案 1 :(得分:1)
使用dplyr
解决方案,我们可以先消除df2$postcode
中的重复项,然后将数据框加入df1
:
library(dplyr)
df3 <- df2 %>%
distinct(postcode, .keep_all = TRUE)
df1 %>%
left_join(df3, by = c("postcode")) %>%
filter(substr(name.x, 1, 3) == substr(name.y, 1, 3)) %>%
rename(name = name.x) %>%
mutate(name.y = NULL)
<小时/> 这将产生
# A tibble: 5 x 3
name postcode rating
<chr> <chr> <int>
1 Katie CB12FR 1
2 James NE34TR 1
3 Alexander DH34RL 1
4 Mary PE46YH 3
5 Barrie IL57DS 1
这是你想要达到的目标吗?