使用文本挖掘规则将行删除到一列中

时间:2017-12-24 19:42:17

标签: r

我的实验数据的格式如下:

df  <-data.frame(product_path = c("https://mycommerece.com/product/book/miracle", "https://mycommerece.com/product/book/miracle2", "https://mycommerece.com/product/gadget/airplane", "https://mycommerece.com/product/book/miracle3"), var1 = c(1,1,1,0), commereceurl = c("https://mycommerece.com/product/","https://mycommerece.com/product/","https://mycommerece.com/product2/","https://www.test.com"), var2 = c(1,0,0,1))
    > df
                                         product_path var1                      commereceurl var2
    1    https://mycommerece.com/product/book/miracle    1  https://mycommerece.com/product/    1
    2   https://mycommerece.com/product/book/miracle2    1  https://mycommerece.com/product/    0
    3 https://mycommerece.com/product/gadget/airplane    1 https://mycommerece.com/product2/    0
    4   https://mycommerece.com/product/book/miracle3    0              https://www.test.com    1

使用来自commereceurl列的数据我想删除特定行中的值不以“https://mycommerece.com”开头的行

输出示例

df  <-data.frame(product_path = c("https://mycommerece.com/product/book/miracle", "https://mycommerece.com/product/book/miracle2", "https://mycommerece.com/product/gadget/airplane"), var1 = c(1,1,1), commereceurl = c("https://mycommerece.com/product/","https://mycommerece.com/product/","https://mycommerece.com/product2/"), var2 = c(1,0,0))

如何实施此规则?

1 个答案:

答案 0 :(得分:3)

您可以使用grep

标识所需的行
KEEP = grep("^https://mycommerece.com", df$commereceurl)
df = df[KEEP,]