匹配两个不同数据帧的两列字符并标记它们

时间:2016-05-25 13:45:38

标签: r dataframe match multiple-columns

所以这是我的主要数据

 Country Consumption Rank
Belarus        17.5    1
 Moldova        16.8    2
Lithuania        15.4    3
  Russia        15.1    4
 Romania        14.4    5
 Ukraine        13.9    6

我还收集了各大洲的其他数据框,如:

 europe
Albania
 Andorra
Armenia
 Austria
Azerbaijan
Belarus

或其他数据框,如

  asia
Afghanistan
 Bahrain
 Bangladesh
  Bhutan
  Brunei

6缅甸(缅甸)

我想将我的数据国家与我所拥有的各大洲国家数据框进行匹配,然后将其标记为欧洲或亚洲等大陆

这里是我管理的代码,但是它们不匹配,所以else只执行:

 if ( data$Country %in% europe$europe) {
 data$con<-c("Europe")
} else if ( data$Country %in% asia$asia) {
 data$con<-c("asia")
 } else if ( data$Country %in% africa$africa) {
data$con<-c("africa")
    } else
    data$con<-c("ridi")

提前谢谢。

2 个答案:

答案 0 :(得分:1)

首先,从各国到各大洲构建地图:

continent_map = stack(c(europe, asia))
names(continent_map) <- c("Country", "Continent")

然后,使用match

dat["Continent"] = continent_map$Continent[ match(dat$Country, continent_map$Country) ]

    Country Consumption Rank Continent
1   Belarus        17.5    1    europe
2   Moldova        16.8    2      <NA>
3 Lithuania        15.4    3      <NA>
4    Russia        15.1    4      <NA>
5   Romania        14.4    5      <NA>
6   Ukraine        13.9    6      <NA>

通常,您应该将相关数据保存在continent_map这样的单一结构中(而不是像OP asiaeurope这样的许多单独的地方。

使用的数据:

dat = structure(list(Country = c("Belarus", "Moldova", "Lithuania", 
"Russia", "Romania", "Ukraine"), Consumption = c(17.5, 16.8, 
15.4, 15.1, 14.4, 13.9), Rank = 1:6), .Names = c("Country", "Consumption", 
"Rank"), row.names = c(NA, -6L), class = "data.frame")
europe = structure(list(europe = c("Albania", "Andorra", "Armenia", "Austria", 
"Azerbaijan", "Belarus")), .Names = "europe", row.names = c(NA, 
-6L), class = "data.frame")
asia = structure(list(asia = c("Afghanistan", "Bahrain", "Bangladesh", 
"Bhutan", "Brunei")), .Names = "asia", row.names = c(NA, -5L), class = "data.frame")

答案 1 :(得分:0)

以下是使用ifelse的一种方法。我略微修改了您的数据,因此您可以看到它适用于亚洲和欧洲

# get your data
df <- read.table(text="Country Consumption Rank
Belarus        17.5    1
                  Brunei        16.8    2
                  Lithuania        15.4    3
                  Austria        15.1    4
                  Romania        14.4    5
                  Ukraine        13.9    6
                  Bangladesh      24.2   5", header=T)

df.europe <- read.table(text=" europe
Albania
                          Andorra
                          Armenia
                          Austria
                          Azerbaijan
                          Belarus", header=T, as.is=T)

df.asia <- read.table(text="asia
Afghanistan
                  Bahrain
                  Bangladesh
                  Bhutan
                  Brunei", header=T, as.is=T)

# use ifelse to get categories
df$con <- ifelse(df$Country %in% df.europe$europe, "europe", 
                 ifelse(df$Country %in% df.asia$asia, "asia", NA))

将嵌套的ifelse保持在最低限度通常是一个好主意,但对于这样一个数千次观察的数据集,它会没问题。