R:使用另一个数据框创建一个新列

时间:2019-03-21 18:19:12

标签: r dataframe mapping match

我有两个数据框:

1)数据1:data1 <- data.frame(Group = c(1, 2, 3), Region = c("Southeast Med, Southeast Low, Southwest Low, Northeast Med", "Northeast High, East Med, Midwest Med High", "Midwest Low, California and HI, West High"),stringsAsFactors=F)

2)data2:data2 <- data.frame(Region = c('California and HI', 'California and HI', 'Northeast High', 'California and HI', 'West High', 'Midwest Med High', 'California and HI', 'California and HI', 'California and HI', 'Southwest Low', 'Midwest Med High', 'California and HI', 'East Med', 'Southeast Low', 'Southeast Med', 'Midwest Med High', 'Southeast Med', 'West High', 'Northeast High', 'California and HI', 'West High', 'California and HI', 'California and HI', 'West High', 'California and HI', 'West High', 'California and HI', 'California and HI'))

我想在data2中使用data1创建一个新列,例如data2$Group,其中group列使用data1检查哪个区域属于哪个组并填充该区域。我该怎么做?另外,比如说data1是一个列表而不是一个数据框,那可能的方法是什么?

1 个答案:

答案 0 :(得分:3)

使用您发布的数据集,您可以这样做

library(tidyverse)

# update data1
data1_upd = data1 %>% separate_rows(Region, sep = ", ")

# join datasets
data2_upd = data2 %>% left_join(data1_upd, by="Region")

新的数据集data2_upd如下所示:

#               Region Group
# 1  California and HI     3
# 2  California and HI     3
# 3     Northeast High     2
# 4  California and HI     3
# 5          West High     3
# 6   Midwest Med High     2
# 7  California and HI     3
# 8  California and HI     3
# 9  California and HI     3
# 10     Southwest Low     1
# 11  Midwest Med High     2
# 12 California and HI     3
# 13          East Med     2
# 14                      NA
# 15                      NA
# 16                      NA
# 17     Southeast Med     1
# 18         West High     3
# 19    Northeast High     2
# 20 California and HI     3
# 21         West High     3
# 22 California and HI     3
# 23 California and HI     3
# 24         West High     3
# 25 California and HI     3
# 26         West High     3
# 27 California and HI     3
# 28 California and HI     3

请注意,此方法使用精确的字符串匹配以连接2个数据集。因此,它区分大小写,并且您所在区域之前或之后的任何空格都会“破坏”连接。这意味着,如果您的数据不如示例中的“干净”,则可能需要进行一些预处理(例如,将区域更新为小写,删除任何开头/结尾的空格)。