考虑这个有趣的例子
mytib <- tibble(text = c('i can see clearly now',
'the rain is gone'),
myweight = c(1.7, 0.005))
# A tibble: 2 x 2
text myweight
<chr> <dbl>
1 i can see clearly now 1.7
2 the rain is gone 0.005
我知道如何创建以dfm
docvars
加权的myweight
。我进行如下操作:
dftest <- mytib %>%
corpus() %>%
tokens() %>%
dfm()
dftest * mytib$myweight
Document-feature matrix of: 2 documents, 9 features (50.0% sparse).
2 x 9 sparse Matrix of class "dfm"
features
docs i can see clearly now the rain is gone
text1 1.7 1.7 1.7 1.7 1.7 0 0 0 0
text2 0 0 0 0 0 0.005 0.005 0.005 0.005
但是问题是我不能同时使用topfeatures
和colSums
。
然后如何求和每一列中的值?
> dftest*mytib$myweight %>% Matrix::colSums(.)
Error in base::colSums(x, na.rm = na.rm, dims = dims, ...) :
'x' must be an array of at least two dimensions
谢谢!
答案 0 :(得分:3)
有时%>%
运算符是伤害而不是帮助。这有效:
colSums(dftest * mytib$myweight)
## i can see clearly now the rain is gone
## 1.700 1.700 1.700 1.700 1.700 0.005 0.005 0.005 0.005
如果每个要素都有权重向量,也可以考虑使用dfm_weight(x, weights = ...)
。上面的操作将回收您的权重以使其按您希望的方式工作,但是您应该了解原因(在R中,由于回收和列的主要顺序)。
答案 1 :(得分:1)
这是因为运算符优先级。如果我们检查?Syntax
,则特殊运算符的优先级要高于乘法(*
)
...
%any% special operators (including %% and %/%) ###
* / multiply, divide ###
...
将表达式包装在括号中,它应该可以工作
(dftest*mytib$myweight) %>%
colSums
# i can see clearly now the rain is gone
# 1.700 1.700 1.700 1.700 1.700 0.005 0.005 0.005 0.005