我想用变量DMA中的,
替换所有-
,)
,(
,和
.
(空格)示例数据框中的.NAME。我提到了三个帖子并尝试了他们的方法,但都失败了:
Replacing column values in data frame, not included in list
R replace all particular values in a data frame
Replace characters from a column of a data frame R
方法1
> shouldbecomeperiod <- c$DMA.NAME %in% c("-", ",", " ", "(", ")")
c$DMA.NAME[shouldbecomeperiod] <- "."
方法2
> removetext <- c("-", ",", " ", "(", ")")
c$DMA.NAME <- gsub(removetext, ".", c$DMA.NAME)
c$DMA.NAME <- gsub(removetext, ".", c$DMA.NAME, fixed = TRUE)
Warning message:
In gsub(removetext, ".", c$DMA.NAME) :
argument 'pattern' has length > 1 and only the first element will be used
方法3
> c[c == c(" ", ",", "(", ")", "-")] <- "."
示例数据框
> df
DMA.CODE DATE DMA.NAME count
111 22 8/14/2014 12:00:00 AM Columbus, OH 1
112 23 7/15/2014 12:00:00 AM Orlando-Daytona Bch-Melbrn 1
79 18 7/30/2014 12:00:00 AM Boston (Manchester) 1
99 22 8/20/2014 12:00:00 AM Columbus, OH 1
112.1 23 7/15/2014 12:00:00 AM Orlando-Daytona Bch-Melbrn 1
208 27 7/31/2014 12:00:00 AM Minneapolis-St. Paul 1
我知道问题 - gsub
使用模式而且只使用第一个元素。另外两种方法是在整个变量中搜索确切的值,而不是在特定字符的值内搜索。
答案 0 :(得分:4)
您可以使用模式组([:punct:]
)内的特殊组[:space:]
和[...]
,如下所示:
df <- data.frame(
DMA.NAME = c(
"Columbus, OH",
"Orlando-Daytona Bch-Melbrn",
"Boston (Manchester)",
"Columbus, OH",
"Orlando-Daytona Bch-Melbrn",
"Minneapolis-St. Paul"),
stringsAsFactors=F)
##
> gsub("[[:punct:][:space:]]+","\\.",df$DMA.NAME)
[1] "Columbus.OH" "Orlando.Daytona.Bch.Melbrn" "Boston.Manchester." "Columbus.OH"
[5] "Orlando.Daytona.Bch.Melbrn" "Minneapolis.St.Paul"
答案 1 :(得分:3)
如果您的数据框很大,您可能希望从stringi
包中查看此快速功能。此函数将特定类的每个字符替换为另一个字符。在这种情况下,字符类是L
- 字母(在{}
内),但是大P
(在{}
之前)表示我们正在寻找此集合的补充,所以对于每个非字母字符。合并表示连续匹配应合并为一个匹配。
require(stringi)
stri_replace_all_charclass(df$DMA.NAME, "\\P{L}",".", merge=T)
## [1] "Columbus.OH" "Orlando.Daytona.Bch.Melbrn" "Boston.Manchester." "Columbus.OH"
## [5] "Orlando.Daytona.Bch.Melbrn" "Minneapolis.St.Paul"
还有一些基准:
x <- sample(df$DMA.NAME, 1000, T)
gsubFun <- function(x){
gsub("[[:punct:][:space:]]+","\\.",x)
}
striFun <- function(x){
stri_replace_all_charclass(x, "\\P{L}",".", T)
}
require(microbenchmark)
microbenchmark(gsubFun(x), striFun(x))
Unit: microseconds
expr min lq median uq max neval
gsubFun(x) 3472.276 3511.0015 3538.097 3573.5835 11039.984 100
striFun(x) 877.259 893.3945 907.769 929.8065 3189.017 100