R中的模式匹配,替换和循环优化

时间:2018-03-22 10:57:02

标签: r for-loop optimization replace grep

我有两个数据框loc_df和city_df(城市和国家)现在loc_df有5列,但在这里只考虑2(Organization.Location.1和Organization.Location.2)有35000行,city_df有2列(城市)和国家)1000行。现在我从city列获取一个值,并使用grepl(用于文本匹配)和for循环(用于迭代)与组织列匹配。我还必须维护一个索引,这就是我使用for循环的原因。但这需要花费大量时间。

我正在尝试将每个城市,州,省名称替换为组织列中的国家/地区名称。

请帮我优化此代码。我是R的新手。

for(k in 1:2){
  if(k==1){

    for (i in 1:nrow(city_df)) {
      x1 <- paste(" ", city_df$City[i], sep = "")
      x2 <- paste(" ", city_df$City[i], " ", sep = "")
      x3 <- paste(city_df$City[i], " ", sep = "")
      # print(x1)

      for (j in 1:nrow(loc_df)) {
        #print(loc_df$Organization.Location.1[j])


        if (grepl(x1, loc_df$Organization.Location.1[j]) |
            grepl(x2, loc_df$Organization.Location.1[j]) |
            grepl(x3, loc_df$Organization.Location.1[j])) {
            loc_df$org_new1[j] <- city_df$Country[i]
          break

        }

      }
    }
  }
  if(k==2){

    for (i in 1:nrow(city_df)) {
      x1 <- paste(" ", city_df$City[i], sep = "")
      x2 <- paste(" ", city_df$City[i], " ", sep = "")
      x3 <- paste(city_df$City[i], " ", sep = "")


      for (j in 1:nrow(loc_df)) {

        if (grepl(x1, loc_df$Organization.Location.2[j]) |
            grepl(x2, loc_df$Organization.Location.2[j]) |
            grepl(x3, loc_df$Organization.Location.3[j])) {
            loc_df$org_new1[j] <- city_df$Country[i]
          break

        }

      }
    }
  }

}

这是我使用city_df

的dput生成的示例数据
          structure(list(City = c("qal eh-ye now", "chaghcharan", "lashkar gah", 
                                  "zaranj", "tarin kowt", "zareh sharan"), Country = c("afghanistan", 
                                                                                       "afghanistan", "afghanistan", "afghanistan", "afghanistan", "afghanistan"
                                  )), .Names = c("City", "Country"), row.names = c(NA, 6L), class = "data.frame")

loc_df的样本

    structure(list(Organization.Location.1 = c("zug  switzerland", 
"zug  canton of zug  switzerland", "zimbabwe", "zigong  chengdu  pr china", 
"zhuhai  guangdong  china", "zaragoza  spain"), Organization.Location.2 = c("", 
"san francisco bay area", "london  canada area", "beijing city  china", 
"greater atlanta area", "paris area  france")), .Names = c("Organization.Location.1", 
"Organization.Location.2"), row.names = c(NA, 6L), class = "data.frame")

输入数据

Organization.Location.1                          Organization.Location.2

zhuhai guangdong china                            mumbai area india

vietnam                                           london united kingdom

期望的输出

 Organization.Location.1                          Organization.Location.2

     china                                              india

     vietnam                                            united kingdom

0 个答案:

没有答案