你好,因为标题建议我需要总结分段数据--->
can.id status qid marks
001 section 1 question 1 112 3
001 section 1 question 2 117 3
001 section 1 question 3 116 3
001 section 2 question 1 115 3
001 section 2 question 2 114 -1
001 section 2 question 3 111 3
001 section 3 question 1 112 -1
001 section 3 question 2 116 3
002 section 1 question 1 114 3
002 section 1 question 2 111 3
002 section 2 question 2 111 -1
002 section 3 question 1 111 -1
我想为每个部分显示每个can.id的标记总和,感谢帮助....
答案 0 :(得分:1)
在R
中,我们可以使用dplyr
。我们extract
(来自tidyr
),子串来自' status'创建'部分'然后按' can.id'分组。和'部分',获取'标记的sum
。
library(dplyr)
library(tidyr)
df1 %>%
extract(status, into = "section", "(.*\\d+)\\s+[[:alpha:]].*") %>%
group_by(can.id, section) %>%
summarise(SumMarks = sum(marks))
# can.id section SumMarks
# <int> <chr> <int>
#1 1 section 1 9
#2 1 section 2 5
#3 1 section 3 2
#4 2 section 1 6
#5 2 section 2 -1
#6 2 section 3 -1
或使用data.table
library(data.table)
setDT(df1)[,.(SumMarks = sum(marks)), .(can.id,
section = sub("\\s+[[:alpha:]].*", "", status))]
df1 <- structure(list(can.id = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L,
2L, 2L, 2L), status = c("section 1 question 1", "section 1 question 2",
"section 1 question 3", "section 2 question 1", "section 2 question 2",
"section 2 question 3", "section 3 question 1", "section 3 question 2",
"section 1 question 1", "section 1 question 2", "section 2 question 2",
"section 3 question 1"), qid = c(112L, 117L, 116L, 115L, 114L,
111L, 112L, 116L, 114L, 111L, 111L, 111L), marks = c(3L, 3L,
3L, 3L, -1L, 3L, -1L, 3L, 3L, 3L, -1L, -1L)), .Names = c("can.id",
"status", "qid", "marks"), class = "data.frame",
row.names = c(NA, -12L))
答案 1 :(得分:0)
我用一些SQL回答了这个问题。
您似乎遇到的问题是您的部分需要从状态字段中拆分出来,您可以执行以下操作;
SELECT
[can.id]
,SUBSTRING([status],1,8) Section
,SUM(marks) Total
FROM samp_data
GROUP BY
[can.id]
,SUBSTRING([status],1,8)
如果您只想要每组的前三名,请查看下面的相关链接
How to select top 3 values from each group in a table with SQL which have duplicates