添加到data.table的新列未反映正确的计算值

时间:2015-12-23 02:13:57

标签: r data.table

我正在尝试向data.table添加两列。原始结构如下:

> aTable
                    word freq
 1: thanks for the follow  612
 2:        the end of the  491
 3:       the rest of the  462
 4:         at the end of  409
 5:        is going to be  359
 6:    for the first time  355
 7:      at the same time  346
 8:      cant wait to see  338
 9:     thank you for the  334
10:     thanks for the rt  321

我的代码如下:

myKeyValfun <- function(line) {
  ret1 = paste(head(strsplit(dtable4G$word,split=" ")[[1]],3), collapse=" ")
  ret2 = tail(strsplit(line,split=" ")[[1]],1)
  return(list(key = ret1, value = ret2))
}

aTable[, c("key","value") := myKeyValfun(word)]

执行此操作后,我注意到只有值正确更新。只有第一行具有正确的值。其他行与第一行具有相同的值。

见下文:

> aTable
                     word freq            key  value
 1: thanks for the follow  612 thanks for the follow
 2:        the end of the  491 thanks for the follow
 3:       the rest of the  462 thanks for the follow
 4:         at the end of  409 thanks for the follow
 5:        is going to be  359 thanks for the follow
 6:    for the first time  355 thanks for the follow
 7:      at the same time  346 thanks for the follow
 8:      cant wait to see  338 thanks for the follow
 9:     thank you for the  334 thanks for the follow
10:     thanks for the rt  321 thanks for the follow

有什么想法吗?

按照akrun的要求添加预期结果:

> aTable
                     word freq            key  value
 1: thanks for the follow  612 thanks for the follow
 2:        the end of the  491     the end of    the
 3:       the rest of the  462    the rest of    the
 4:         at the end of  409     at the end     of
 5:        is going to be  359    is going to     be
 6:    for the first time  355  for the first   time
 7:      at the same time  346    at the same   time
 8:      cant wait to see  338   cant wait to    see
 9:     thank you for the  334   thank you for   the
10:     thanks for the rt  321  thanks for the    rt

1 个答案:

答案 0 :(得分:3)

如果我们需要将前三个单词提取到&#39;键&#39;最后一个单词是&#39; value&#39;,一个选项是sub

aTable[, c('key', 'value') := list(sub('(.*)\\s+.*', '\\1', word), sub('.*\\s+', '', word))]
aTable
#                     word freq            key  value
# 1: thanks for the follow  612 thanks for the follow
# 2:        the end of the  491     the end of    the
# 3:       the rest of the  462    the rest of    the
# 4:         at the end of  409     at the end     of
# 5:        is going to be  359    is going to     be
# 6:    for the first time  355  for the first   time
# 7:      at the same time  346    at the same   time
# 8:      cant wait to see  338   cant wait to    see
# 9:     thank you for the  334  thank you for    the
#10:     thanks for the rt  321 thanks for the     rt

或者我们使用tstrsplit

aTable[, c('key', 'value') := {
             tmp <- tstrsplit(word, ' ')
             list(do.call(paste, tmp[1:3]), tmp[[4]])}]