我有两个数据框loc_df和city_df(城市和国家)现在loc_df有5列,但在这里只考虑2(Organization.Location.1和Organization.Location.2)有35000行,city_df有2列(城市)和国家)1000行。现在我从city列获取一个值,并使用grepl(用于文本匹配)和for循环(用于迭代)与组织列匹配。我还必须维护一个索引,这就是我使用for循环的原因。但这需要花费大量时间。
我正在尝试将每个城市,州,省名称替换为组织列中的国家/地区名称。
请帮我优化此代码。我是R的新手。
for(k in 1:2){
if(k==1){
for (i in 1:nrow(city_df)) {
x1 <- paste(" ", city_df$City[i], sep = "")
x2 <- paste(" ", city_df$City[i], " ", sep = "")
x3 <- paste(city_df$City[i], " ", sep = "")
# print(x1)
for (j in 1:nrow(loc_df)) {
#print(loc_df$Organization.Location.1[j])
if (grepl(x1, loc_df$Organization.Location.1[j]) |
grepl(x2, loc_df$Organization.Location.1[j]) |
grepl(x3, loc_df$Organization.Location.1[j])) {
loc_df$org_new1[j] <- city_df$Country[i]
break
}
}
}
}
if(k==2){
for (i in 1:nrow(city_df)) {
x1 <- paste(" ", city_df$City[i], sep = "")
x2 <- paste(" ", city_df$City[i], " ", sep = "")
x3 <- paste(city_df$City[i], " ", sep = "")
for (j in 1:nrow(loc_df)) {
if (grepl(x1, loc_df$Organization.Location.2[j]) |
grepl(x2, loc_df$Organization.Location.2[j]) |
grepl(x3, loc_df$Organization.Location.3[j])) {
loc_df$org_new1[j] <- city_df$Country[i]
break
}
}
}
}
}
这是我使用city_df
的dput生成的示例数据 structure(list(City = c("qal eh-ye now", "chaghcharan", "lashkar gah",
"zaranj", "tarin kowt", "zareh sharan"), Country = c("afghanistan",
"afghanistan", "afghanistan", "afghanistan", "afghanistan", "afghanistan"
)), .Names = c("City", "Country"), row.names = c(NA, 6L), class = "data.frame")
loc_df的样本
structure(list(Organization.Location.1 = c("zug switzerland",
"zug canton of zug switzerland", "zimbabwe", "zigong chengdu pr china",
"zhuhai guangdong china", "zaragoza spain"), Organization.Location.2 = c("",
"san francisco bay area", "london canada area", "beijing city china",
"greater atlanta area", "paris area france")), .Names = c("Organization.Location.1",
"Organization.Location.2"), row.names = c(NA, 6L), class = "data.frame")
输入数据
Organization.Location.1 Organization.Location.2
zhuhai guangdong china mumbai area india
vietnam london united kingdom
期望的输出
Organization.Location.1 Organization.Location.2
china india
vietnam united kingdom