我正在尝试向data.table添加两列。原始结构如下:
> aTable
word freq
1: thanks for the follow 612
2: the end of the 491
3: the rest of the 462
4: at the end of 409
5: is going to be 359
6: for the first time 355
7: at the same time 346
8: cant wait to see 338
9: thank you for the 334
10: thanks for the rt 321
我的代码如下:
myKeyValfun <- function(line) {
ret1 = paste(head(strsplit(dtable4G$word,split=" ")[[1]],3), collapse=" ")
ret2 = tail(strsplit(line,split=" ")[[1]],1)
return(list(key = ret1, value = ret2))
}
aTable[, c("key","value") := myKeyValfun(word)]
执行此操作后,我注意到只有值正确更新。只有第一行具有正确的值。其他行与第一行具有相同的值。
见下文:
> aTable
word freq key value
1: thanks for the follow 612 thanks for the follow
2: the end of the 491 thanks for the follow
3: the rest of the 462 thanks for the follow
4: at the end of 409 thanks for the follow
5: is going to be 359 thanks for the follow
6: for the first time 355 thanks for the follow
7: at the same time 346 thanks for the follow
8: cant wait to see 338 thanks for the follow
9: thank you for the 334 thanks for the follow
10: thanks for the rt 321 thanks for the follow
有什么想法吗?
按照akrun的要求添加预期结果:
> aTable
word freq key value
1: thanks for the follow 612 thanks for the follow
2: the end of the 491 the end of the
3: the rest of the 462 the rest of the
4: at the end of 409 at the end of
5: is going to be 359 is going to be
6: for the first time 355 for the first time
7: at the same time 346 at the same time
8: cant wait to see 338 cant wait to see
9: thank you for the 334 thank you for the
10: thanks for the rt 321 thanks for the rt
答案 0 :(得分:3)
如果我们需要将前三个单词提取到&#39;键&#39;最后一个单词是&#39; value&#39;,一个选项是sub
aTable[, c('key', 'value') := list(sub('(.*)\\s+.*', '\\1', word), sub('.*\\s+', '', word))]
aTable
# word freq key value
# 1: thanks for the follow 612 thanks for the follow
# 2: the end of the 491 the end of the
# 3: the rest of the 462 the rest of the
# 4: at the end of 409 at the end of
# 5: is going to be 359 is going to be
# 6: for the first time 355 for the first time
# 7: at the same time 346 at the same time
# 8: cant wait to see 338 cant wait to see
# 9: thank you for the 334 thank you for the
#10: thanks for the rt 321 thanks for the rt
或者我们使用tstrsplit
aTable[, c('key', 'value') := {
tmp <- tstrsplit(word, ' ')
list(do.call(paste, tmp[1:3]), tmp[[4]])}]