用R中的多个矢量值替换多个值

时间:2016-02-10 15:56:27

标签: r

我有一个国家/地区名称,例如locations = c("UK", "USA", "US", "United States", "United Kingdom", ...)。我想替换美国的所有变种被美国取代,英国的所有变种都被GB取代。除了写ifep以检查每个案例,我可以提供两个向量,例如originalNames = c("USA", "United States", "US", "United States of America",...)newNames = c("US", "US", "US", "US",...)我会指示R用newNames中相应的位置替换originalNames中找到的所有位置值?

2 个答案:

答案 0 :(得分:2)

创建两个向量,每个originalNames USUK变体各一个。然后使用gsub()

us_pattern <- paste(us_newNames, collapse="|");
uk_pattern <- paste(uk_newNames, collapse="|");

locations <- gsub(us_pattern, "US", locations, perl=TRUE, ignore.case=TRUE);
locations <- gsub(uk_pattern, "GB", locations, perl=TRUE, ignore.case=TRUE);

答案 1 :(得分:2)

这个怎么样?

locations <- c("UK", "USA", "US", "United States", "United Kingdom") 
originalNames <- c("USA", "United States", "United States of America", "United Kingdom")
newNames <- c("US", "US", "US", "UK")
Reduce(function(x,i) gsub(originalNames[i],newNames[i],x),seq_along(originalNames),locations)

> locations <- c("UK", "USA", "US", "United States", "United Kingdom") 
> originalNames <- c("USA", "United States", "United States of America", "United Kingdom")
> newNames <- c("US", "US", "US", "UK")
> Reduce(function(x,i) gsub(originalNames[i],newNames[i],x),seq_along(originalNames),locations)
[1] "UK" "US" "US" "US" "UK"

对此工作的限制是originalNames和newNames是相等的长度向量,其中originalNames[i]应由newNames[i]替换。

此功能会对您的向量locations进行多次传递,每次在该向量中进行替换,查找originalNames[i]并将其替换为newNames[i]

如果你想要一个更快/更优雅的解决方案,不会在大型数据集上进行多次传递,你可以尝试这样的事情:

library(data.table)
original.locations <- data.table(locations=c("UK", "USA", "US", "United States", "United Kingdom")) 
replacements <- data.table(originalNames=c("USA", "United States", "United States of America", "United Kingdom"),
newNames=c("US", "US", "US", "UK"))
setkey(original.locations,locations)
setkey(replacements,originalNames)
original.locations[replacements,replacement.name:=i.newNames]
original.locations

    > original.locations
        locations replacement.name
1:             UK               NA
2:             US               NA
3:            USA               US
4: United Kingdom               UK
5:  United States               US

(请注意,我没有指定“UK”和“US”的替换;您可以通过明确地将它们与自己匹配来避免NA。)