Question

以下是R -

中的示例数据框

date                  item_id           price
2010-09-15            0034              4546
2010-09-15            ABXC              4325
2010-09-15            12AB              3545
2010-09-15            ZF9C              4354
2010-09-15            Z923              7854
2010-09-15            923F              780

期望的输出 -

date                  item_id           price
2010-09-15            ABXC              4325
2010-09-15            12AB              3545
2010-09-15            ZF9C              4354
2010-09-15            Z923              7854
2010-09-15            923F              780

我到目前为止尝试过 -

outlier_seq<-c('0','1','2','3','4','5','6','7','8','9')
df1<-sample_df[!grepl(paste(outlier_seq, collapse = "|"), sample$item_id),]

但这是删除所有item_id编号。而不是只是我想过滤掉那些其item_id由所有数字组成的记录。对此有何帮助？

感谢

Answer 1

假设你开始于：

mydf <- structure(list(date = c("2010-09-15", "2010-09-15", "2010-09-15", 
    "2010-09-15", "2010-09-15"), item_id = c("0034", "ABXC", "12AB", 
    "ZF9C", "ZF9C23"), price = c(4546L, 4325L, 3545L, 4354L, 7854L
    )), .Names = c("date", "item_id", "price"), row.names = c(NA, 
    5L), class = "data.frame")

你应该能够做到：

mydf[!grepl("^[0-9]", mydf$item_id), ]
##         date item_id price
## 2 2010-09-15    ABXC  4325
## 4 2010-09-15    ZF9C  4354
## 5 2010-09-15  ZF9C23  7854

Answer 2

或者我们可以使用tidyverse来匹配以^开头（[^0-9]+）与一个或多个非数字（str_detect）字符的模式，以返回逻辑向filter行

的向量

library(dplyr)
library(stringr)
mydf %>% 
    filter(str_detect(item_id, "^[^0-9]+"))
#        date item_id price
#1 2010-09-15    ABXC  4325
#2 2010-09-15    ZF9C  4354
#3 2010-09-15  ZF9C23  7854

更新

对于OP帖子中的更新问题，我们可以查找从字符串的开头（[0-9]+）到结尾（^）有一个或多个数字（$）的模式，否定（!）逻辑向量以将TRUE/FALSE反转为FALSE/TRUE和filter

mydf %>%
       filter(!str_detect(item_id, "^[0-9]+$"))
#        date item_id price
#1 2010-09-15    ABXC  4325
#2 2010-09-15    12AB  3545
#3 2010-09-15    ZF9C  4354
#4 2010-09-15  ZF9C23  7854

UPDATE2

基于OP担心它正在过滤掉“07R2”，通过添加具有该值的另一行来测试它

mydf %>% 
     filter(!str_detect(item_id, "^[0-9]+$"))
 #        date item_id price
 #1 2010-09-15    ABXC  4325
 #2 2010-09-15    12AB  3545
 #3 2010-09-15    ZF9C  4354
 #4 2010-09-15  ZF9C23  7854
 #5 2010-09-15    07R2  7934

UPDATE3

基于OP的新数据集

mydf %>% 
     filter(!str_detect(item_id, "^[0-9]+$"))
#        date item_id price
#1 2010-09-15    ABXC  4325
#2 2010-09-15    12AB  3545
#3 2010-09-15    ZF9C  4354
#4 2010-09-15    Z923  7854
#5 2010-09-15    923F   780

即使该列为factor，也可以

mydf %>%
      filter(!str_detect(factor(item_id), "^[0-9]+$"))
#        date item_id price
#1 2010-09-15    ABXC  4325
#2 2010-09-15    12AB  3545
#3 2010-09-15    ZF9C  4354
#4 2010-09-15    Z923  7854
#5 2010-09-15    923F   780

数据

#data from last update
mydf <- structure(list(date = c("2010-09-15", "2010-09-15", "2010-09-15", 
"2010-09-15", "2010-09-15", "2010-09-15"), item_id = c("0034", 
"ABXC", "12AB", "ZF9C", "Z923", "923F"), price = c(4546L, 4325L, 
3545L, 4354L, 7854L, 780L)), .Names = c("date", "item_id", "price"
 ), class = "data.frame", row.names = c(NA, -6L))

根据具有所有数字的列（item_id）值过滤数据帧？

2 个答案:

更新

UPDATE2

UPDATE3

数据