按位置拆分data.table列中的文本字符串

时间:2017-07-24 20:54:24

标签: r split data.table strsplit

我正在尝试弄明白我如何使用tstrisplit() data.table功能从split DT2 <- data.table(a = paste0(LETTERS[1:5],seq(10,15)), b = runif(6)) DT2 a b 1: A10 0.4153622 2: B11 0.1567381 3: C12 0.5361883 4: D13 0.5920144 5: E14 0.3376648 6: A15 0.5503773 I tried the following which did not work: DT2[, c("L", "D") := tstrsplit(a, "")][] DT2[, c("L", "D") := tstrsplit(a, "[A-Z]")][] DT2[, c("L", "D") := tstrsplit(a, "[0-9]{1}")][] 一个文字位置编号。我知道Q1Q2&amp; Q3但这些并不能解决我的问题。

作为一个例子:

     a         b    L   D
1: A10 0.4153622    A   10
2: B11 0.1567381    B   11
3: C12 0.5361883    C   12
4: D13 0.5920144    D   13
5: E14 0.3376648    E   14
6: A15 0.5503773    A   15

期望:

$query = "
SELECT a.title, GROUP_CONCAT(b.tag) as tags, count(b.tag) AS relevance FROM posts a
INNER JOIN tags b ON a.id = b.pid
WHERE b.tag IN ($tags)
GROUP BY a.id
ORDER BY count(b.tag) DESC
";

任何有关解释的帮助都非常感谢。

1 个答案:

答案 0 :(得分:1)

如果要在字母和数字之间拆分,可以拆分正则表达式"(?<=[A-Za-z])(?=[0-9])"(?&lt; = [A-Za-z])(?= [0-9])将拆分限制在一个前面有一个字母后面跟一个数字的位置:

正则表达式包含两个部分,在(?<=[A-Za-z])后面,这意味着在一封信后面向前看(?=[0-9]),即在数字之前,查看有关正则表达式 {{3}的更多信息在r中,您需要指定perl=TRUE以使用与Perl兼容的regexp来实现这些功能:

DT2[, c("L", "D") := tstrsplit(a, "(?<=[A-Za-z])(?=[0-9])", perl=TRUE)][]

#     a          b L  D
#1: A10 0.01487372 A 10
#2: B11 0.95035709 B 11
#3: C12 0.49230300 C 12
#4: D13 0.67183871 D 13
#5: E14 0.40076579 E 14
#6: A15 0.27871477 A 15