假设我有以下R data.table
(虽然我很乐意使用基础R,data.frame)
library(data.table)
dt = data.table(Category=c("First","First","First","Second","Third", "Third", "Second"), Frequency=c(10,15,5,2,14,20,3), times = c(0, 0, 0, 3, 3, 1))
> dt
Category Frequency times
1: First 10 0
2: First 15 0
3: First 5 0
4: Second 2 3
5: Third 14 3
6: Third 20 1
7: Second 3 0
如果我希望按类别对频率求和,我会使用以下内容:
data[, sum(Frequency), by = Category]
但是,假设我想Frequency
Category
加times
当且仅当NA
非零并且不等于> dt
Category Frequency times
1: First ten 0
2: First ten 0
3: First five 0
4: Second five 3
5: Third five 3
6: Third five 1
7: Second ten 0
?
如何根据单独列的值使这个总和成为条件?
编辑:对这个显而易见的问题道歉。快速添加:如果某列的元素是字符串呢?e.g。
Sum()
ten
不会计算five
与SELECT
request_items.*,
request_status_line_manager.definition AS line_manager_status,
request_status_c_level.definition AS c_level_status,
FROM
request_items
INNER JOIN request_status AS request_status_line_manager
ON request_items.line_manager_remark = request_status_line_manager.id
INNER JOIN request_status AS request_status_c_level
ON request_items.c_level_remark = request_status_c_level.id
WHERE
request_items.state = 'active'
AND
request_items.request_id = '$id'
答案 0 :(得分:2)
记住data.table
:dt[i, j, by]
的逻辑,即dt
,使用i
的子集行,然后按{{1}分组计算j
}。
by
答案 1 :(得分:1)
您可以使用括号子集仅选择times
的非零和非NA值的行,然后运行分组操作。
dt[which(dt$times > 0)][, sum(Frequency), by = Category]
答案 2 :(得分:1)
您可以为此使用rowum()。
行摘要
基于分组变量给出矩阵或数据框的列求和
对于分组变量的每个级别,计算列在类似数字矩阵的对象的行中求和。 rowum是通用的,有一种用于数据帧的方法,一种用于向量和矩阵的默认方法。
关键字:manip
rowsum(x, group, reorder = TRUE, …)
rowsum(x, group, reorder = TRUE, na.rm = FALSE, …)
rowsum(x, group, reorder = TRUE, na.rm = FALSE, …)
参数 数字数据的矩阵,数据框或向量。允许缺少值。数值向量将被视为列向量。 组
a vector or factor giving the grouping, with one element per row of x. Missing values will be treated as another group and a warning will be given.
重新排序
if TRUE, then the result will be in order of sort(unique(group)), if FALSE, it will be in the order that groups were encountered.
na.rm
logical (TRUE or FALSE). Should NA (including NaN) values be discarded?
other arguments to be passed to or from methods
默认是重新排列行以使其与Tapply一致,如下例所示。重新排序应该不会明显增加时间,除非group的值非常多且x的列很少。
最初的函数是由Terry Therneau编写的,但这是一个使用散列的新实现,对于大型矩阵而言,这要快得多。
要对矩阵的所有行(即单个组)求和,请使用colSums,它应该更快。
对于整数参数,形成总和的上溢/下溢会导致NA。
包含和的矩阵或数据帧。每个唯一值