我试图根据数据框中的几个条件标记数据中的某些行。
我的数据如下:
X <- structure(list(Website = c("www.something.at", "www.something.nl", "www.something.ch", "www.something.dk", "www.something.at"),
Country = c("German", "Netherlands", "German", "Denmark", "Austria")),
.Names = c("Website", "Country"), row.names = c(NA, 10L), class = "data.frame")
我需要做的是添加一个新列,根据特定条件在新列中标记数据。因此,如果国家/地区等于德语,我需要查看网站URL并使用IF函数标记它与不同的国家/地区名称。即奥地利或瑞士。
我已经到了下面,我希望我错过了一些非常简单的东西,但代码在标记瑞士时效果很好,但在所有其他情况下,所有内容都被标记为奥地利。
for(i in 1:nrow(X)){
if(length(grep("German", X$Country[i]))>0)
if(length(grep("\\.at$", X$Website[i]))>0)
X$Website_2[i] <- "Austria"
else
if(length(grep("\\.ch$", X$Website[i]))>0)
X$Website_2[i] <- "Switzerland"
}
非常感谢任何帮助!
答案 0 :(得分:1)
您可以使用ifelse
来避免使用for
循环。这是一种方式:
# Your data was a little messed up.
X<-data.frame(Website = c("www.something.at", "www.something.nl", "www.something.ch", "www.something.dk", "www.something.at"))
# A simple way.
X$Website_2<-NA
X$Website_2<-ifelse(grepl("\\.dk",X$Website),'Germany',X$Website_2)
X$Website_2<-ifelse(grepl("\\.at",X$Website),'Austria',X$Website_2)
X$Website_2<-ifelse(grepl("\\.ch",X$Website),'Switzerland',X$Website_2)
稍微更优雅的解决方案是使用国家/地区代码和国家/地区的映射表。
# A more elegant solution
X<-data.frame(Website = c("www.something.at", "www.something.nl", "www.something.ch", "www.something.dk", "www.something.at"))
map<-data.frame(country.code=c('dk','at','ch'),
Country=c('Germany','Austria','Switzerland'))
# country.code Country
# 1 dk Germany
# 2 at Austria
# 3 ch Switzerland
X$country.code<-gsub('.*\\.([^\\.]*)$','\\1',X$Website)
merge(X,map,all.x=TRUE)
# country.code Website Country
# 1 at www.something.at Austria
# 2 at www.something.at Austria
# 3 ch www.something.ch Switzerland
# 4 dk www.something.dk Germany
# 5 nl www.something.nl <NA>
并非映射荷兰,因为它不在map
data.frame
中。
答案 1 :(得分:0)
这是你想要的东西吗? (顺便说一下你的dput似乎有问题,它说有10行,但只有5个值,所以我在这里也改了。
> X <- structure(list(Website = c("www.something.at", "www.something.nl", "www.something.ch", "www.something.dk", "www.something.at"),
+ Country = c("German", "Netherlands", "German", "Denmark", "Austria")),
+ .Names = c("Website", "Country"), row.names = c(NA, 5L), class = "data.frame")
>
>
#we use upper to make it robust against multiple capitalization schemes
#instead of nesting another ifelse, we use the fact that we can add to logical values
# and use the returned number to index into out country vector.
> X<-within(X,
+ cleanCountry <- ifelse(toupper(Country)=="GERMAN",
+ c("Switzerland", "Austria")[1+grepl("\\.at", Website)],
+ Country))
> X
Website Country cleanCountry
1 www.something.at German Austria
2 www.something.nl Netherlands Netherlands
3 www.something.ch German Switzerland
4 www.something.dk Denmark Denmark
5 www.something.at Austria Austria