这里的数据
mydf=structure(list(X.U.FEFF.ID = c(3951L, 3955L, 3956L, 3957L, 3958L
), ITEM_SUM = c(29.9, 55.99, 59, 40.95, 47.25), QUANTITY = c(1L,
1L, 1L, 1L, 1L), PRICE = c(29.9, 55.99, 59, 40.95, 47.25), NDS10 = c(0,
0, 5.36, 0, 4.3), NDS18 = c(0, 8.54, 0, 6.25, 0), id = structure(c(5L,
1L, 4L, 3L, 2L), .Label = c("*2108609 fsfhsfghsgfhjdfsdh", "2013077 a[osdig[aodifg[ad",
"2030918 Пhsapsgiju[aeri 180г", "3420159 rgyaeghpiudarsfghpuashg 900г",
"any text"), class = "factor"), ID_C_REGCODES_CASH_VOUCHER = c(3945L,
3953L, 3953L, 3953L, 3953L), DISCOUNTNAME = c(NA, NA, NA, NA,
NA), DISCOUNTSUM = c(0L, 0L, 0L, 0L, 0L)), .Names = c("X.U.FEFF.ID",
"ITEM_SUM", "QUANTITY", "PRICE", "NDS10", "NDS18", "id", "ID_C_REGCODES_CASH_VOUCHER",
"DISCOUNTNAME", "DISCOUNTSUM"), class = "data.frame", row.names = c(NA,
-5L))
在id
列中
any text
*2108609 fsfhsfghsgfhjdfsdh
3420159 rgyaeghpiudarsfghpuashg 900г
2030918 Пhsapsgiju[aeri 180г
2013077 a[osdig[aodifg[ad
从行号超过4位的行中,我需要从行中删除此类数字。
所需的输出,该id列为
any text
fsfhsfghsgfhjdfsdh
rgyaeghpiudarsfghpuashg 900г
Пhsapsgiju[aeri 180г
a[osdig[aodifg[ad
如何做到?
答案 0 :(得分:3)
sub
是一个选项
sub("[^.]\\d{4,} ", "", mydf$id)
#[1] "any text"
#[2] "fsfhsfghsgfhjdfsdh"
#[3] "rgyaeghpiudarsfghpuashg 900г"
#[4] "Пhsapsgiju[aeri 180г"
#[5] "a[osdig[aodifg[ad"
要更改数据中的列,请执行
mydf$id <- sub("[^.]\\d{4,} ", "", mydf$id)
答案 1 :(得分:3)
这是另一个正则表达式。
gsub("[^[:alnum:]]*\\d{4,}", "", mydf$id)
#[1] "any text" " fsfhsfghsgfhjdfsdh"
#[3] " rgyaeghpiudarsfghpuashg 900г" " Пhsapsgiju[aeri 180г"
#[5] " a[osdig[aodifg[ad"
如果您还想删除结果前的空白,请用trimws
将其包裹起来:
trimws(gsub("[^[:alnum:]]*\\d{4,}", "", mydf$id))
编辑。
如果您要删除多列中超过4位数字的数字,请按上述lapply
语句的行使用gsub
函数。
df
是一个data.frame,它具有两列具有编号1
和2
的列。
df <- mydf["id"]
df$new <- mydf[["id"]]
df[1:2] <- lapply(df[1:2], function(s)
trimws(gsub("[^[:alnum:]]*\\d{4,}", "", s)))