假设我们得到了这个数据表X
:
Random <- function(n=1, lenght=6){
randomString <- c(1:n)
for (i in 1:n){randomString[i] <- paste(sample(c(0:9, letters, LETTERS),
lenght, replace=TRUE),collapse="")}
return(randomString)}
X <- data.table(A = rnorm(11000, sd = 0.8),
B = rnorm(11000, mean = 10, sd = 3),
C = sample( LETTERS[1:24], 11000, replace=TRUE),
D = sample( letters[1:24], 11000, replace=TRUE),
E = round(rnorm(11000,mean=25, sd=3)),
F = round(runif(n = 11000,min = 1000,max = 25000)),
G = round(runif(11000,0,200000)),
H = Random(11000))
我希望通过一些子字符串对其进行子集化。在此,我们将在g
列
F
,d
和H
在这里,我们得到了一个解决方案,可以为一个子字符串执行此操作:How to select R data.table rows based on substring match (a la SQL like)
如果我们只想g
,请使用data.table
包:
X[like(H,pattern = "g")]
但我的问题是在一次操作中为g
,F
和d
复制此内容。
Vec <- c("g","F","d")
Newtable <- X[like(H,pattern = Vec)]
Warning message:
In grep(pattern, levels(vector)) :
argument 'pattern' has length > 1 and only the first element will be used
有没有办法执行此功能,创建3个表,合并它们并删除重复项?
答案 0 :(得分:4)
我们可以grep
paste
vector
collapse
|
使用X[grep(paste(Vec, collapse="|"), H)]
paste
pattern
。{/ 1}}
collapse
或者我们可以|
X[like(H, pattern = paste(Vec, collapse="|"))]
向量input {
rabbitmq {
host => 'rabbit'
durable => true
user => 'user'
queue => 'dev-user_trace'
password => 'pass'
type => 'traces' # <-- add this
}
rabbitmq {
host => 'rabbit'
durable => true
user => 'user'
queue => 'min-price-queue'
password => 'pass'
type => 'prices' # <-- add this
}
}
filter{
}
output{
stdout { codec => json}
if [type] == 'traces' { # <-- check type
elasticsearch{
hosts => ["host1:9200"]
index => "index1-%{+YYYY.MM.dd}"
}
}
if [type] == 'prices' { # <-- check type
elasticsearch{
hosts => ["host2:9200"]
index => "index2-%{+YYYY.MM.dd}"
}
}
}
d input {
rabbitmq {
host => 'rabbit'
durable => true
user => 'user'
queue => 'dev-user_trace'
password => 'pass'
type => 'index1' # <-- add this
}
rabbitmq {
host => 'rabbit'
durable => true
user => 'user'
queue => 'min-price-queue'
password => 'pass'
type => 'index2' # <-- add this
}
}
filter{
}
output{
stdout { codec => json}
elasticsearch{
hosts => ["localhost:9200"]
index => "%{type}-%{+YYYY.MM.dd}" # <-- use type here
}
}
使用相同的方法(由@Tensibal建议)
create index
答案 1 :(得分:1)
我想你也可以用这个:
NewTable <- X[grepl("g",H) | grepl("F",H) | grepl("d",H)]