通过子串

时间:2016-07-27 08:43:03

标签: r substring data.table subset

假设我们得到了这个数据表X

Random <- function(n=1, lenght=6){
  randomString <- c(1:n)
  for (i in 1:n){randomString[i] <- paste(sample(c(0:9, letters, LETTERS),
                                   lenght, replace=TRUE),collapse="")}
  return(randomString)}

X <- data.table(A = rnorm(11000, sd = 0.8),
                B = rnorm(11000, mean = 10, sd = 3),
                C = sample( LETTERS[1:24], 11000, replace=TRUE),
                D = sample( letters[1:24], 11000, replace=TRUE),
                E = round(rnorm(11000,mean=25, sd=3)),
                F = round(runif(n = 11000,min = 1000,max = 25000)),
                G = round(runif(11000,0,200000)),
                H = Random(11000))

我希望通过一些子字符串对其进行子集化。在此,我们将在g

中采用FdH

在这里,我们得到了一个解决方案,可以为一个子字符串执行此操作:How to select R data.table rows based on substring match (a la SQL like)

如果我们只想g,请使用data.table包:

X[like(H,pattern = "g")]

但我的问题是在一次操作中为gFd复制此内容。

Vec <- c("g","F","d")
Newtable <- X[like(H,pattern = Vec)]
Warning message:
In grep(pattern, levels(vector)) :
  argument 'pattern' has length > 1 and only the first element will be used

有没有办法执行此功能,创建3个表,合并它们并删除重复项?

2 个答案:

答案 0 :(得分:4)

我们可以grep paste vector collapse |使用X[grep(paste(Vec, collapse="|"), H)] paste pattern。{/ 1}}

collapse

或者我们可以| X[like(H, pattern = paste(Vec, collapse="|"))] 向量input { rabbitmq { host => 'rabbit' durable => true user => 'user' queue => 'dev-user_trace' password => 'pass' type => 'traces' # <-- add this } rabbitmq { host => 'rabbit' durable => true user => 'user' queue => 'min-price-queue' password => 'pass' type => 'prices' # <-- add this } } filter{ } output{ stdout { codec => json} if [type] == 'traces' { # <-- check type elasticsearch{ hosts => ["host1:9200"] index => "index1-%{+YYYY.MM.dd}" } } if [type] == 'prices' { # <-- check type elasticsearch{ hosts => ["host2:9200"] index => "index2-%{+YYYY.MM.dd}" } } } d input { rabbitmq { host => 'rabbit' durable => true user => 'user' queue => 'dev-user_trace' password => 'pass' type => 'index1' # <-- add this } rabbitmq { host => 'rabbit' durable => true user => 'user' queue => 'min-price-queue' password => 'pass' type => 'index2' # <-- add this } } filter{ } output{ stdout { codec => json} elasticsearch{ hosts => ["localhost:9200"] index => "%{type}-%{+YYYY.MM.dd}" # <-- use type here } } 使用相同的方法(由@Tensibal建议)

create index

答案 1 :(得分:1)

我想你也可以用这个:

NewTable <- X[grepl("g",H) | grepl("F",H)  | grepl("d",H)]