从超过4位数字的字符串中删除

时间:2018-10-13 13:32:09

标签: r regex

这里的数据

mydf=structure(list(X.U.FEFF.ID = c(3951L, 3955L, 3956L, 3957L, 3958L
), ITEM_SUM = c(29.9, 55.99, 59, 40.95, 47.25), QUANTITY = c(1L, 
1L, 1L, 1L, 1L), PRICE = c(29.9, 55.99, 59, 40.95, 47.25), NDS10 = c(0, 
0, 5.36, 0, 4.3), NDS18 = c(0, 8.54, 0, 6.25, 0), id = structure(c(5L, 
1L, 4L, 3L, 2L), .Label = c("*2108609 fsfhsfghsgfhjdfsdh", "2013077 a[osdig[aodifg[ad", 
"2030918 Пhsapsgiju[aeri 180г", "3420159 rgyaeghpiudarsfghpuashg 900г", 
"any text"), class = "factor"), ID_C_REGCODES_CASH_VOUCHER = c(3945L, 
3953L, 3953L, 3953L, 3953L), DISCOUNTNAME = c(NA, NA, NA, NA, 
NA), DISCOUNTSUM = c(0L, 0L, 0L, 0L, 0L)), .Names = c("X.U.FEFF.ID", 
"ITEM_SUM", "QUANTITY", "PRICE", "NDS10", "NDS18", "id", "ID_C_REGCODES_CASH_VOUCHER", 
"DISCOUNTNAME", "DISCOUNTSUM"), class = "data.frame", row.names = c(NA, 
-5L))

id列中

any text
*2108609 fsfhsfghsgfhjdfsdh
3420159 rgyaeghpiudarsfghpuashg 900г
2030918 Пhsapsgiju[aeri 180г
2013077 a[osdig[aodifg[ad

从行号超过4位的行中,我需要从行中删除此类数字。

所需的输出,该id列为

any text
fsfhsfghsgfhjdfsdh
rgyaeghpiudarsfghpuashg 900г
Пhsapsgiju[aeri 180г
a[osdig[aodifg[ad

如何做到?

2 个答案:

答案 0 :(得分:3)

sub是一个选项

sub("[^.]\\d{4,} ", "", mydf$id)
#[1] "any text"                    
#[2] "fsfhsfghsgfhjdfsdh"          
#[3] "rgyaeghpiudarsfghpuashg 900г"
#[4] "Пhsapsgiju[aeri 180г"        
#[5] "a[osdig[aodifg[ad" 

要更改数据中的列,请执行

mydf$id <- sub("[^.]\\d{4,} ", "", mydf$id)

答案 1 :(得分:3)

这是另一个正则表达式。

gsub("[^[:alnum:]]*\\d{4,}", "", mydf$id)
#[1] "any text"                      " fsfhsfghsgfhjdfsdh"          
#[3] " rgyaeghpiudarsfghpuashg 900г" " Пhsapsgiju[aeri 180г"        
#[5] " a[osdig[aodifg[ad"

如果您还想删除结果前的空白,请用trimws将其包裹起来:

trimws(gsub("[^[:alnum:]]*\\d{4,}", "", mydf$id))

编辑。

如果您要删除多列中超过4位数字的数字,请按上述lapply语句的行使用gsub函数。

df是一个data.frame,它具有两列具有编号12的列。

df <- mydf["id"]
df$new <- mydf[["id"]]

df[1:2] <- lapply(df[1:2], function(s) 
  trimws(gsub("[^[:alnum:]]*\\d{4,}", "", s)))