将字典随机分为n个部分

时间:2019-10-16 09:10:13

标签: r split quanteda

我有一本quanteda字典,我想随机分成n个部分。

dict <- dictionary(list(positive = c("good", "amazing", "best", "outstanding", "beautiful", "wonderf*"),
            negative = c("bad", "worst", "awful", "atrocious", "deplorable", "horrendous")))

我已经尝试过使用split这样的函数:split(dict, f=factor(3)),但没有成功。

我想拿回三本字典,但是我得到了

$`3`
Dictionary object with 2 key entries.
- [positive]:
  - good, amazing, best, outstanding, beautiful, wonderf*
- [negative]:
  - bad, worst, awful, atrocious, deplorable, horrendous

编辑

我在字典中加入了另一个包含*的条目。 Ken Benoit建议的解决方案在这种情况下会引发错误,但在其他情况下则可以正常工作。

所需的输出是这样的:

> dict_1
Dictionary object with 2 key entries.
- [positive]:
  - good, wonderf*
- [negative]:
  - deplorable, horrendous

> dict_2
Dictionary object with 2 key entries.
- [positive]:
  - amazing, best
- [negative]:
  - bad, worst

> dict_3
Dictionary object with 2 key entries.
- [positive]:
  - outstanding, beautiful
- [negative]:
  - awful, atrocious

万一条目数不能除以n而没有余数,我没有具体说明,但理想我可以决定我要(i)“余数”或(ii)我希望分配所有值(这导致某些拆分略大)。

1 个答案:

答案 0 :(得分:2)

这个问题有很多未指定的地方,因为使用不同长度的字典键尚不清楚应如何处理,并且由于期望答案中没有成对的模式。

在这里,我假设您具有相等长度的键,可以通过拆分将其整除而没有余数,并且您想在运行中以每个字典键的相邻间隔对其进行拆分。

这应该做到。

library("quanteda")
## Package version: 1.5.1

dict <- dictionary(
  list(
    positive = c("good", "amazing", "best", "outstanding", "beautiful", "delightful"),
    negative = c("bad", "worst", "awful", "atrocious", "deplorable", "horrendous")
  )
)

dictionary_split <- function(x, len) {
  maxlen <- max(lengths(x)) # change to minumum to avoid recycling
  subindex <- split(seq_len(maxlen), ceiling(seq_len(maxlen) / len))
  splitlist <- lapply(subindex, function(y) lapply(x, "[", y))
  names(splitlist) <- paste0("dict_", seq_along(splitlist))
  lapply(splitlist, dictionary)
}

dictionary_split(dict, 2)
## $dict_1
## Dictionary object with 2 key entries.
## - [positive]:
##   - good, amazing
## - [negative]:
##   - bad, worst
## 
## $dict_2
## Dictionary object with 2 key entries.
## - [positive]:
##   - best, outstanding
## - [negative]:
##   - awful, atrocious
## 
## $dict_3
## Dictionary object with 2 key entries.
## - [positive]:
##   - beautiful, delightful
## - [negative]:
##   - deplorable, horrendous