我有以下数据表:
> measures
source measure
1: my123 0.08130182
2: 123my -1.45285168
3: your123 -0.30460771
4: 123your 0.94670380
5: 12your3 -0.54728546
> sources
name pattern
1: My Source my
2: Your Source your
使用
创建measures <- data.table(source=c('my123', '123my', 'your123', '123your', '12your3'), measure=rnorm(5))
sources <- data.table(name=c('My Source', 'Your Source'), pattern=c('my', 'your'))
我希望能够加入like(measures.source, sources.pattern)
。有没有一种好方法可以做到这一点(无需交叉连接和过滤不匹配的行。这对我的数据集来说是不切实际的)
我可以在SQL(PostgreSQL,见下文)中做到这一点,但我想知道有没有办法在R&#39; data.table
或任何计划引入更多自定义加入功能在将来。
drop table if exists measures;
create table measures as (select * from (values
('my123', 0.08130182),
('123my', -1.45285168),
('your123', -0.30460771),
('123your', 0.94670380),
('your123', 0.94670380)
)t(source, measure));
drop table if exists sources;
create table sources as (select * from (values
('My Source', 'my'),
('Your Sources', 'your')
)t(name, pattern));
select * from measures join sources on measures.source ~ sources.pattern;
然后返回所需的:
source | measure | name | pattern
--------+-------------+--------------+---------
my123 | 0.08130182 | My Source | my
123my | -1.45285168 | My Source | my
your123 | -0.30460771 | Your Sources | your
123your | 0.94670380 | Your Sources | your
your123 | 0.94670380 | Your Sources | your
答案 0 :(得分:0)
我不确定这是否属于&#34;不切实际的&#34;或不是,但这样做......为了您的目的,更复杂的模式匹配stringi将处理整理器。
> rbind.pages(lapply(1:nrow(measures), function(i){
matched_slice <- which(stri_detect_regex(measures[i,1],sources$pattern))
data.frame(measures[i,], sources[matched_slice, ])
}))
source measure name pattern
1 my123 0.75119183 My Source my
2 123my 0.55344334 My Source my
3 your123 -0.03498414 Your Source your
4 123your 0.09364795 Your Source your
5 12your3 0.47537732 Your Source your
对于较大的数据集,请使用parallel::mclapply
或data.table
- 以这种方式运行:
rbindlist(lapply(1:nrow(measures), function(i){
matched_slice <- which(stri_detect_regex(measures[i,1],sources$pattern))
cbind(measures[i,], sources[matched_slice, ])
}))