我有data.frame
,我想确定来自sample1$domain
的哪些单元格" www",将其替换为""
和strsplit
相应的sample1$suffix
。数据如下所示:
domain suffix
1 wbx2 com
2 redhat com
3 something com
4 gstatic com
5 www googleapis.com
6 smartfilter com
我设法解决了这个问题,如下所示,但它改变了行的位置(我希望它保持在第5位)并且考虑到它将运行数百万个案例,我不会这样做。认为这是最有效的方法。:
library("stringr")
sample1$domain <- ifelse(sample1$domain == "www", "", sample1$domain)
sample1[sample1$domain == "", c("domain", "suffix")] <- sample1[sample1$domain == "", c("suffix", "domain")]
y <- sample1$domain[sample1$suffix == ""]
z <- as.data.frame(unlist(str_split_fixed(y, "[.]", 2)))
colnames(z) <- c("domain", "suffix")
sample1 <- rbind(sample1, z)
sample1 <- subset(sample1, sample1$suffix != "")
rownames(sample1) <- NULL
sample1
# domain suffix
#1 wbx2 com
#2 redhat com
#3 something com
#4 gstatic com
#5 smartfilter com
#6 googleapis com
数据
sample1 <- structure(list(domain = c("wbx2", "redhat", "something",
"gstatic", "www", "smartfilter"), suffix = c("com", "com", "com",
"com", "googleapis.com", "com")), .Names = c("domain", "suffix"
), row.names = c(NA, 6L), class = "data.frame")
答案 0 :(得分:1)
我们可以使用"www"
为值创建索引。然后使用该索引替换站点名称,最后替换站点后缀:
ind <- sample1$domain == "www"
sample1$domain[ind] <- sub("^(.*)\\..*", "\\1", sample1$suffix[ind])
sample1$suffix[ind] <- sub(".*\\.(.*)", "\\1", sample1$suffix[ind])
sample1
# domain suffix
# 1 wbx2 com
# 2 redhat com
# 3 something com
# 4 gstatic com
# 5 googleapis com
# 6 smartfilter com