我有一个国家/地区名称,例如locations = c("UK", "USA", "US", "United States", "United Kingdom", ...)
。我想替换美国的所有变种被美国取代,英国的所有变种都被GB取代。除了写ifep以检查每个案例,我可以提供两个向量,例如originalNames = c("USA", "United States", "US", "United States of America",...)
和newNames = c("US", "US", "US", "US",...)
我会指示R用newNames中相应的位置替换originalNames中找到的所有位置值?
答案 0 :(得分:2)
创建两个向量,每个originalNames
US
和UK
变体各一个。然后使用gsub()
:
us_pattern <- paste(us_newNames, collapse="|");
uk_pattern <- paste(uk_newNames, collapse="|");
locations <- gsub(us_pattern, "US", locations, perl=TRUE, ignore.case=TRUE);
locations <- gsub(uk_pattern, "GB", locations, perl=TRUE, ignore.case=TRUE);
答案 1 :(得分:2)
这个怎么样?
locations <- c("UK", "USA", "US", "United States", "United Kingdom")
originalNames <- c("USA", "United States", "United States of America", "United Kingdom")
newNames <- c("US", "US", "US", "UK")
Reduce(function(x,i) gsub(originalNames[i],newNames[i],x),seq_along(originalNames),locations)
> locations <- c("UK", "USA", "US", "United States", "United Kingdom")
> originalNames <- c("USA", "United States", "United States of America", "United Kingdom")
> newNames <- c("US", "US", "US", "UK")
> Reduce(function(x,i) gsub(originalNames[i],newNames[i],x),seq_along(originalNames),locations)
[1] "UK" "US" "US" "US" "UK"
对此工作的限制是originalNames和newNames是相等的长度向量,其中originalNames[i]
应由newNames[i]
替换。
此功能会对您的向量locations
进行多次传递,每次在该向量中进行替换,查找originalNames[i]
并将其替换为newNames[i]
如果你想要一个更快/更优雅的解决方案,不会在大型数据集上进行多次传递,你可以尝试这样的事情:
library(data.table)
original.locations <- data.table(locations=c("UK", "USA", "US", "United States", "United Kingdom"))
replacements <- data.table(originalNames=c("USA", "United States", "United States of America", "United Kingdom"),
newNames=c("US", "US", "US", "UK"))
setkey(original.locations,locations)
setkey(replacements,originalNames)
original.locations[replacements,replacement.name:=i.newNames]
original.locations
> original.locations
locations replacement.name
1: UK NA
2: US NA
3: USA US
4: United Kingdom UK
5: United States US
(请注意,我没有指定“UK”和“US”的替换;您可以通过明确地将它们与自己匹配来避免NA。)