删除逗号和/或句点,除非在R中最后一次出现某些条件

时间:2016-08-21 08:46:39

标签: r gsub

我想从字符串中删除所有逗号和句点,除非字符串以逗号(或句点)结尾,后跟一个或两个数字。

一些例子是:

12.345.67 #would become 12345.67
12.345,67 #would become 12345,67
12.345,6  #would become 12345,6
12.345.6  #would become 12345.6
12.345    #would become 12345
1,2.345   #would become 12345

等等

3 个答案:

答案 0 :(得分:2)

使用@Sotos相同数据的stringi解决方案是:

library(stringi)
  • 第1行会删除最后一个,.字符,如果超过2个字符

  • 第2行会删除第一个,.个字符,如果有多个,.

x<-ifelse(stri_locate_last_regex(x,"([,.])")[,2]<(stri_length(x)-2), stri_replace_last_regex(x,"([,.])",""),x)

x <- if(stri_count_regex(x,"([,.])") > 1){stri_replace_first_regex(x,"([,.])","")}
> x
[1] "12345.67" "12345,67" "12345,6"  "12234"    "1234"     "12.45" 

答案 1 :(得分:1)

一种解决方案是在最后一个逗号/句点(nchar(word(x, -1, sep = ',|\\.')))之后计算字符,如果长度大于2,则删除所有分隔符(gsub(',|\\.', '', x)),否则只删除第一个分隔符( sub(',|\\.', '', x)。

library(stringr)
ifelse(nchar(word(x, -1, sep = ',|\\.')) > 2, gsub(',|\\.', '', x), sub(',|\\.', '', x))

#[1] "12345.67" "12345,67" "12345,6"  "12234"    "1234"     "12.45"  

数据

x <- c("12.345.67", "12.345,67", "12.345,6", "1,2.234", "1.234", "1,2.45")

答案 2 :(得分:1)

另一种选择是使用带有?!正则表达式的负前瞻语法perl compatible

df
#          V1
# 1 12.345.67
# 2 12.345,67
# 3  12.345,6
# 4  12.345.6
# 5    12.345
# 6   1,2.345

df$V1 = gsub("[,.](?!\\d{1,2}$)", "", df$V1, perl = T)
df          # remove , or . except they are followed by 1 or 2 digits at the end of string
#         V1
# 1 12345.67
# 2 12345,67
# 3  12345,6
# 4  12345.6
# 5    12345
# 6    12345