我正在尝试弄明白我如何使用tstrisplit()
data.table
功能从split
到 DT2 <- data.table(a = paste0(LETTERS[1:5],seq(10,15)), b = runif(6))
DT2
a b
1: A10 0.4153622
2: B11 0.1567381
3: C12 0.5361883
4: D13 0.5920144
5: E14 0.3376648
6: A15 0.5503773
I tried the following which did not work:
DT2[, c("L", "D") := tstrsplit(a, "")][]
DT2[, c("L", "D") := tstrsplit(a, "[A-Z]")][]
DT2[, c("L", "D") := tstrsplit(a, "[0-9]{1}")][]
一个文字位置编号。我知道Q1,Q2&amp; Q3但这些并不能解决我的问题。
作为一个例子:
a b L D
1: A10 0.4153622 A 10
2: B11 0.1567381 B 11
3: C12 0.5361883 C 12
4: D13 0.5920144 D 13
5: E14 0.3376648 E 14
6: A15 0.5503773 A 15
期望:
$query = "
SELECT a.title, GROUP_CONCAT(b.tag) as tags, count(b.tag) AS relevance FROM posts a
INNER JOIN tags b ON a.id = b.pid
WHERE b.tag IN ($tags)
GROUP BY a.id
ORDER BY count(b.tag) DESC
";
任何有关解释的帮助都非常感谢。
答案 0 :(得分:1)
如果要在字母和数字之间拆分,可以拆分正则表达式"(?<=[A-Za-z])(?=[0-9])"
,(?&lt; = [A-Za-z])(?= [0-9])将拆分限制在一个前面有一个字母后面跟一个数字的位置:
正则表达式包含两个部分,在(?<=[A-Za-z])
后面,这意味着在一封信后面向前看(?=[0-9])
,即在数字之前,查看有关正则表达式 {{3}的更多信息在r中,您需要指定perl=TRUE
以使用与Perl兼容的regexp来实现这些功能:
DT2[, c("L", "D") := tstrsplit(a, "(?<=[A-Za-z])(?=[0-9])", perl=TRUE)][]
# a b L D
#1: A10 0.01487372 A 10
#2: B11 0.95035709 B 11
#3: C12 0.49230300 C 12
#4: D13 0.67183871 D 13
#5: E14 0.40076579 E 14
#6: A15 0.27871477 A 15