我有一个如下所示的数据框:
structure(list(ab = c(0, 1, 1, 1, 1, 0, 0, 0, 1, 1), bc = c(1,
1, 1, 1, 0, 0, 0, 1, 0, 1), de = c(0, 0, 1, 1, 1, 0, 1, 1, 0,
1), cl = c(1, 2, 3, 1, 2, 3, 1, 2, 3, 2)), .Names = c("ab", "bc",
"de", "cl"), row.names = c(NA, -10L), class = "data.frame")
列cl表示簇关联,变量ab,bc& de携带二进制答案,其中1表示是和0 - 否。
我正在尝试创建一个表格交叉表格集群以及数据框中的所有其他列,即ab,bc和de,其中集群成为列变量。所需的输出就像这样
1 2 3
ab 1 3 2
bc 2 3 1
de 2 3 1
我尝试了以下代码:
with(newdf, tapply(newdf[,c(3)], cl, sum))
这为我提供了一次只交叉一列的值。我的数据框有1600多列,有1个簇列。有人可以帮忙吗?
答案 0 :(得分:7)
使用<?php $form=$this->beginWidget('bootstrap.widgets.BsActiveForm', array(
'id'=>'review-business-form',
// Please note: When you enable ajax validation, make sure the corresponding
// controller action is handling ajax validation correctly.
// There is a call to performAjaxValidation() commented in generated controller code.
// See class documentation of CActiveForm for details on this.
'enableAjaxValidation'=>false,
)); ?>
<?php
$this->widget('ext.DzRaty.DzRaty', array(
'model' => $reviewmodel,
'attribute' => 'rating',
)); ?>
</ul>
</div>
<div class="form-group">
<label>Review Text</label>
<?php echo $form->textarea($reviewmodel,'review',array('maxlength'=>500)); ?>
</div>
<?php echo BsHtml::submitButton('Submit', array('color' => BsHtml::BUTTON_COLOR_PRIMARY)); ?>
<?php $this->endWidget(); ?>
的一种方法是:
dplyr
输出:
library(dplyr)
df %>%
#group by the varialbe cl
group_by(cl) %>%
#sum every column
summarize_each(funs(sum)) %>%
#select the three needed columns
select(ab, bc, de) %>%
#transpose the df
t
答案 1 :(得分:6)
您的数据采用半长半格式,您希望它采用全宽格式。如果我们首先将其转换为完全长格式,这是最简单的:
[,1] [,2] [,3]
ab 1 3 2
bc 2 3 1
de 2 3 1
然后我们可以使用library(reshape2)
df_long = melt(df, id.vars = "cl")
head(df_long)
# cl variable value
# 1 1 ab 0
# 2 2 ab 1
# 3 3 ab 1
# 4 1 ab 1
# 5 2 ab 1
# 6 3 ab 0
作为聚合函数将其转换为宽格式:
sum
答案 2 :(得分:4)
在base
R:
t(sapply(data[,1:3],function(x) tapply(x,data[,4],sum)))
# 1 2 3
#ab 1 3 2
#bc 2 3 1
#de 2 3 1
答案 3 :(得分:2)
您还可以合并std::discrete_distribution<int>
make_distribution(const std::vector<Data>& weights)
{
const auto n = weights.size();
const auto op = [&weights](double d){
const auto index = static_cast<std::size_t>(d - 0.5);
//std::cout << "weights[" << index << "].value == " << weights[index].value << "\n";
return weights[index].value;
};
return std::discrete_distribution<int> {n, 0.0, static_cast<double>(n), op};
}
或tidyr:gather
和reshape2::melt
以获得您的竞争表
xtabs
如果您更喜欢使用烟斗
library(tidyr)
xtabs(value ~ key + cl, data = gather(df, key, value, -cl))
## cl
## key 1 2 3
## ab 1 3 2
## bc 2 3 1
## de 2 3 1
答案 4 :(得分:0)
只需按照dickoa编写的代码,使用dplyr的ivot_longer(代替聚集)进行更新,即可:
library(dplyr)
df %>%
pivot_longer(cols = ab:de,
names_to = "key",
values_to = "value") %>%
xtabs(value ~ key + cl, data = .)