R - 使用grep的嵌套IF语句

时间:2014-05-01 14:31:07

标签: r loops if-statement grep

我试图根据数据框中的几个条件标记数据中的某些行。

我的数据如下:

X <- structure(list(Website = c("www.something.at", "www.something.nl", "www.something.ch", "www.something.dk", "www.something.at"), 
                    Country = c("German", "Netherlands", "German", "Denmark", "Austria")), 
                    .Names = c("Website", "Country"), row.names = c(NA, 10L), class = "data.frame")

我需要做的是添加一个新列,根据特定条件在新列中标记数据。因此,如果国家/地区等于德语,我需要查看网站URL并使用IF函数标记它与不同的国家/地区名称。即奥地利或瑞士。

我已经到了下面,我希望我错过了一些非常简单的东西,但代码在标记瑞士时效果很好,但在所有其他情况下,所有内容都被标记为奥地利。

    for(i in 1:nrow(X)){
    if(length(grep("German", X$Country[i]))>0)

    if(length(grep("\\.at$", X$Website[i]))>0)
    X$Website_2[i] <- "Austria"

    else
    if(length(grep("\\.ch$", X$Website[i]))>0)
    X$Website_2[i] <- "Switzerland"

    }

非常感谢任何帮助!

2 个答案:

答案 0 :(得分:1)

您可以使用ifelse来避免使用for循环。这是一种方式:

# Your data was a little messed up.
X<-data.frame(Website = c("www.something.at", "www.something.nl", "www.something.ch", "www.something.dk", "www.something.at"))

# A simple way.
X$Website_2<-NA
X$Website_2<-ifelse(grepl("\\.dk",X$Website),'Germany',X$Website_2)
X$Website_2<-ifelse(grepl("\\.at",X$Website),'Austria',X$Website_2)
X$Website_2<-ifelse(grepl("\\.ch",X$Website),'Switzerland',X$Website_2)

稍微更优雅的解决方案是使用国家/地区代码和国家/地区的映射表。

# A more elegant solution
X<-data.frame(Website = c("www.something.at", "www.something.nl", "www.something.ch", "www.something.dk", "www.something.at"))

map<-data.frame(country.code=c('dk','at','ch'),
                Country=c('Germany','Austria','Switzerland'))
#   country.code     Country
# 1           dk     Germany
# 2           at     Austria
# 3           ch Switzerland

X$country.code<-gsub('.*\\.([^\\.]*)$','\\1',X$Website)
merge(X,map,all.x=TRUE)

# country.code          Website     Country
# 1           at www.something.at     Austria
# 2           at www.something.at     Austria
# 3           ch www.something.ch Switzerland
# 4           dk www.something.dk     Germany
# 5           nl www.something.nl        <NA>

并非映射荷兰,因为它不在map data.frame中。

答案 1 :(得分:0)

这是你想要的东西吗? (顺便说一下你的dput似乎有问题,它说有10行,但只有5个值,所以我在这里也改了。

> X <- structure(list(Website = c("www.something.at", "www.something.nl", "www.something.ch", "www.something.dk", "www.something.at"), 
+                     Country = c("German", "Netherlands", "German", "Denmark", "Austria")), 
+                     .Names = c("Website", "Country"), row.names = c(NA, 5L), class = "data.frame")
> 
> 
 #we use upper to make it robust against multiple capitalization schemes
 #instead of nesting another ifelse, we use the fact that we can add to logical values
 # and use the returned number to index into out country vector.
> X<-within(X,
+           cleanCountry <- ifelse(toupper(Country)=="GERMAN",
+                           c("Switzerland", "Austria")[1+grepl("\\.at", Website)],
+                            Country))
> X
           Website     Country cleanCountry
1 www.something.at      German      Austria
2 www.something.nl Netherlands  Netherlands
3 www.something.ch      German  Switzerland
4 www.something.dk     Denmark      Denmark
5 www.something.at     Austria      Austria