我有一本quanteda
字典,我想随机分成n
个部分。
dict <- dictionary(list(positive = c("good", "amazing", "best", "outstanding", "beautiful", "wonderf*"),
negative = c("bad", "worst", "awful", "atrocious", "deplorable", "horrendous")))
我已经尝试过使用split
这样的函数:split(dict, f=factor(3))
,但没有成功。
我想拿回三本字典,但是我得到了
$`3`
Dictionary object with 2 key entries.
- [positive]:
- good, amazing, best, outstanding, beautiful, wonderf*
- [negative]:
- bad, worst, awful, atrocious, deplorable, horrendous
编辑
我在字典中加入了另一个包含*
的条目。 Ken Benoit建议的解决方案在这种情况下会引发错误,但在其他情况下则可以正常工作。
所需的输出是这样的:
> dict_1
Dictionary object with 2 key entries.
- [positive]:
- good, wonderf*
- [negative]:
- deplorable, horrendous
> dict_2
Dictionary object with 2 key entries.
- [positive]:
- amazing, best
- [negative]:
- bad, worst
> dict_3
Dictionary object with 2 key entries.
- [positive]:
- outstanding, beautiful
- [negative]:
- awful, atrocious
万一条目数不能除以n
而没有余数,我没有具体说明,但理想我可以决定我要(i)“余数”或(ii)我希望分配所有值(这导致某些拆分略大)。
答案 0 :(得分:2)
这个问题有很多未指定的地方,因为使用不同长度的字典键尚不清楚应如何处理,并且由于期望答案中没有成对的模式。
在这里,我假设您具有相等长度的键,可以通过拆分将其整除而没有余数,并且您想在运行中以每个字典键的相邻间隔对其进行拆分。
这应该做到。
library("quanteda")
## Package version: 1.5.1
dict <- dictionary(
list(
positive = c("good", "amazing", "best", "outstanding", "beautiful", "delightful"),
negative = c("bad", "worst", "awful", "atrocious", "deplorable", "horrendous")
)
)
dictionary_split <- function(x, len) {
maxlen <- max(lengths(x)) # change to minumum to avoid recycling
subindex <- split(seq_len(maxlen), ceiling(seq_len(maxlen) / len))
splitlist <- lapply(subindex, function(y) lapply(x, "[", y))
names(splitlist) <- paste0("dict_", seq_along(splitlist))
lapply(splitlist, dictionary)
}
dictionary_split(dict, 2)
## $dict_1
## Dictionary object with 2 key entries.
## - [positive]:
## - good, amazing
## - [negative]:
## - bad, worst
##
## $dict_2
## Dictionary object with 2 key entries.
## - [positive]:
## - best, outstanding
## - [negative]:
## - awful, atrocious
##
## $dict_3
## Dictionary object with 2 key entries.
## - [positive]:
## - beautiful, delightful
## - [negative]:
## - deplorable, horrendous