根据条件

时间:2018-05-01 13:31:41

标签: r

对于两个示例数据帧:

df1 <- structure(list(name = c("Katie", "Eve", "James", "Alexander", 
"Mary", "Barrie", "Harry", "Sam"), postcode = c("CB12FR", "CB12FR", 
"NE34TR", "DH34RL", "PE46YH", "IL57DS", "IP43WR", "IL45TR")), .Names = c("name", 
"postcode"), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, 
-8L), spec = structure(list(cols = structure(list(name = structure(list(), class = c("collector_character", 
"collector")), postcode = structure(list(), class = c("collector_character", 
"collector"))), .Names = c("name", "postcode")), default = structure(list(), class = c("collector_guess", 
"collector"))), .Names = c("cols", "default"), class = "col_spec"))

df2 <-structure(list(name = c("Katie", "James", "Alexander", "Lucie", 
"Mary", "Barrie", "Claire", "Harry", "Clare", "Hannah", "Rob", 
"Eve", "Sarah"), postcode = c("CB12FR", "NE34TR", "DH34RL", "DL56TH", 
"PE46YH", "IL57DS", "RE35TP", "IP43WQ", "BH35OP", "CB12FR", "DL56TH", 
"CB12FR", "IL45TR"), rating = c(1L, 1L, 1L, 2L, 3L, 1L, 4L, 2L, 
2L, 3L, 1L, 4L, 2L)), .Names = c("name", "postcode", "rating"
), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, 
-13L), spec = structure(list(cols = structure(list(name = structure(list(), class = c("collector_character", 
"collector")), postcode = structure(list(), class = c("collector_character", 
"collector")), rating = structure(list(), class = c("collector_integer", 
"collector"))), .Names = c("name", "postcode", "rating")), default = structure(list(), class = c("collector_guess", 
"collector"))), .Names = c("cols", "default"), class = "col_spec"))

我希望合并两个数据帧,因此df2上的等级被添加到df1。我通常会使用:

ratings.df&lt; - merge(x = df1,y = df2,by =“postcode”,all.x = TRUE)

无论其.... 我希望只在以下情况下合并: 1. df2中的邮政编码是唯一的(即,如果每个名称(或不同的名称)有多个邮政编码,则不会合并这些邮政编码)。 2.两个数据框中名称的前三个字母相同。

(我很高兴没有评级的邮政编码空白(我可以手动完成这些)。

这可能吗?

2 个答案:

答案 0 :(得分:1)

为什么不使用sqldf包裹?您可以使用此包合并R中的data.frames。通过使用JOIN语句来执行此操作。

就条件合并而言,这可以通过在SQL中使用CASE语句来实现。

因此,对于您的第一个条件,您可以使用CASECOUNT(postcode) = ‘1’所在的GROUP BY name,这样,对于分配了1个邮政编码的每个名称,您可以{ {1}}。

另一个选择是JOIN使用gather

答案 1 :(得分:1)

使用dplyr解决方案,我们可以先消除df2$postcode中的重复项,然后将数据框加入df1

library(dplyr)
df3 <- df2 %>%
  distinct(postcode, .keep_all = TRUE)

df1 %>%
  left_join(df3, by = c("postcode")) %>%
  filter(substr(name.x, 1, 3) == substr(name.y, 1, 3)) %>%
  rename(name = name.x) %>%
  mutate(name.y = NULL)

<小时/> 这将产生

# A tibble: 5 x 3
  name      postcode rating
  <chr>     <chr>     <int>
1 Katie     CB12FR        1
2 James     NE34TR        1
3 Alexander DH34RL        1
4 Mary      PE46YH        3
5 Barrie    IL57DS        1

这是你想要达到的目标吗?