我是R的新手,想要解决这个问题:在我的附表中,我需要计算每个参与者的平均值,直到特定的column_date。 I. e。直到2015-08-30彼得在5个条目中得到4分,所以在右边的一个新行中,一个列字段需要等于4/5等等......
我使用聚合进行了一些计算,但只得到每个参与者名称组的平均值...
提前致谢!!
Date Participant Right/Wrong
2013-01-02 Peter 1
2015-01-05 Caroline 1
2015-02-03 Jack 0
2015-03-05 Jennifer 0
2015-03-09 Peter 1
2016-04-14 Jennifer 0
2015-04-16 Caroline 1
2015-06-02 Jennifer 1
2015-06-05 Peter 1
2015-06-10 Caroline 0
2015-07-10 Jack 1
2015-08-01 Jennifer 0
2015-08-05 Peter 0
2015-07-14 Jack 1
2015-08-30 Peter 1
2015-12-14 Jennifer 1
2015-12-24 Jack 1
2015-12-27 Peter 1
2015-12-30 Caroline 1
答案 0 :(得分:2)
注意:我在下面添加了html表数据,现在已经从你的问题中删除了。
library('XML')
doc <- htmlParse(xml_content)
df1 <- readHTMLTable(doc)
df1 <- df1[[1]]
df1$Date <- as.Date(as.character(df1$Date))
df1$Participant <- as.character(df1$Participant)
df1$`Right/Wrong` <- as.numeric(as.character(df1$`Right/Wrong`))
使用Base R(不需要包)
a1 <- with(df1,
by(data = df1,
INDICES = Participant,
FUN = function(x) list(Participant = x$Participant,
Date = x$Date,
cumsum = cumsum(x$`Right/Wrong`),
cummean = cumsum(x$`Right/Wrong`)/sum(x$`Right/Wrong`))))
rownames(a1) <- NULL # remove row names
do.call("rbind", lapply(a1, function(x) data.frame(x)))
使用data.table库
library('data.table')
setDT(df1)[, .(cumsum = cumsum(`Right/Wrong`), cummean = cumsum(`Right/Wrong`)/sum(`Right/Wrong`), Date), by = c('Participant')]
# Participant cumsum cummean Date
# 1: Peter 1 0.2000000 2013-01-02
# 2: Peter 2 0.4000000 2015-03-09
# 3: Peter 3 0.6000000 2015-06-05
# 4: Peter 3 0.6000000 2015-08-05
# 5: Peter 4 0.8000000 2015-08-30
# 6: Peter 5 1.0000000 2015-12-27
# 7: Caroline 1 0.3333333 2015-01-05
# 8: Caroline 2 0.6666667 2015-04-16
# 9: Caroline 2 0.6666667 2015-06-10
# 10: Caroline 3 1.0000000 2015-12-30
# 11: Jack 0 0.0000000 2015-02-03
# 12: Jack 1 0.3333333 2015-07-10
# 13: Jack 2 0.6666667 2015-07-14
# 14: Jack 3 1.0000000 2015-12-24
# 15: Jennifer 0 0.0000000 2015-03-05
# 16: Jennifer 0 0.0000000 2016-04-14
# 17: Jennifer 1 0.5000000 2015-06-02
# 18: Jennifer 1 0.5000000 2015-08-01
# 19: Jennifer 2 1.0000000 2015-12-14
数据:
xml_content <- '<style type="text/css">
.tg {border-collapse:collapse;border-spacing:0;}
.tg td{font-family:Arial, sans-serif;font-size:14px;padding:10px 5px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;}
.tg th{font-family:Arial, sans-serif;font-size:14px;font-weight:normal;padding:10px 5px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;}
.tg .tg-yw4l{vertical-align:top}
</style>
<table class="tg">
<tr>
<th class="tg-031e">Date</th>
<th class="tg-031e">Participant</th>
<th class="tg-031e">Right/Wrong</th>
</tr>
<tr>
<td class="tg-031e">2013-01-02</td>
<td class="tg-031e">Peter</td>
<td class="tg-031e">1</td>
</tr>
<tr>
<td class="tg-031e">2015-01-05</td>
<td class="tg-031e">Caroline</td>
<td class="tg-031e">1</td>
</tr>
<tr>
<td class="tg-yw4l">2015-02-03</td>
<td class="tg-yw4l">Jack</td>
<td class="tg-yw4l">0</td>
</tr>
<tr>
<td class="tg-yw4l">2015-03-05</td>
<td class="tg-yw4l">Jennifer</td>
<td class="tg-yw4l">0</td>
</tr>
<tr>
<td class="tg-yw4l">2015-03-09</td>
<td class="tg-yw4l">Peter</td>
<td class="tg-yw4l">1</td>
</tr>
<tr>
<td class="tg-yw4l">2016-04-14</td>
<td class="tg-yw4l">Jennifer</td>
<td class="tg-yw4l">0</td>
</tr>
<tr>
<td class="tg-yw4l">2015-04-16</td>
<td class="tg-yw4l">Caroline</td>
<td class="tg-yw4l">1</td>
</tr>
<tr>
<td class="tg-yw4l">2015-06-02</td>
<td class="tg-yw4l">Jennifer</td>
<td class="tg-yw4l">1</td>
</tr>
<tr>
<td class="tg-yw4l">2015-06-05</td>
<td class="tg-yw4l">Peter</td>
<td class="tg-yw4l">1</td>
</tr>
<tr>
<td class="tg-yw4l">2015-06-10</td>
<td class="tg-yw4l">Caroline</td>
<td class="tg-yw4l">0</td>
</tr>
<tr>
<td class="tg-yw4l">2015-07-10</td>
<td class="tg-yw4l">Jack</td>
<td class="tg-yw4l">1</td>
</tr>
<tr>
<td class="tg-yw4l">2015-08-01</td>
<td class="tg-yw4l">Jennifer</td>
<td class="tg-yw4l">0</td>
</tr>
<tr>
<td class="tg-yw4l">2015-08-05</td>
<td class="tg-yw4l">Peter</td>
<td class="tg-yw4l">0</td>
</tr>
<tr>
<td class="tg-yw4l">2015-07-14</td>
<td class="tg-yw4l">Jack</td>
<td class="tg-yw4l">1</td>
</tr>
<tr>
<td class="tg-yw4l">2015-08-30</td>
<td class="tg-yw4l">Peter</td>
<td class="tg-yw4l">1</td>
</tr>
<tr>
<td class="tg-yw4l">2015-12-14</td>
<td class="tg-yw4l">Jennifer</td>
<td class="tg-yw4l">1</td>
</tr>
<tr>
<td class="tg-yw4l">2015-12-24</td>
<td class="tg-yw4l">Jack</td>
<td class="tg-yw4l">1</td>
</tr>
<tr>
<td class="tg-yw4l">2015-12-27</td>
<td class="tg-yw4l">Peter</td>
<td class="tg-yw4l">1</td>
</tr>
<tr>
<td class="tg-yw4l">2015-12-30</td>
<td class="tg-yw4l">Caroline</td>
<td class="tg-yw4l">1</td>
</tr>
</table>'
答案 1 :(得分:0)
您可以尝试:
participants <- structure(list(Date = structure(c(1L, 2L, 3L, 4L, 5L, 19L, 6L,
7L, 8L, 9L, 10L, 12L, 13L, 11L, 14L, 15L, 16L, 17L, 18L), .Label = c("2013-01-02",
"2015-01-05", "2015-02-03", "2015-03-05", "2015-03-09", "2015-04-16",
"2015-06-02", "2015-06-05", "2015-06-10", "2015-07-10", "2015-07-14",
"2015-08-01", "2015-08-05", "2015-08-30", "2015-12-14", "2015-12-24",
"2015-12-27", "2015-12-30", "2016-04-14"), class = "factor"),
Participant = structure(c(4L, 1L, 2L, 3L, 4L, 3L, 1L, 3L,
4L, 1L, 2L, 3L, 4L, 2L, 4L, 3L, 2L, 4L, 1L), .Label = c("Caroline",
"Jack", "Jennifer", "Peter"), class = "factor"), Right.Wrong = c(1L,
1L, 0L, 0L, 1L, 0L, 1L, 1L, 1L, 0L, 1L, 0L, 0L, 1L, 1L, 1L,
1L, 1L, 1L)), .Names = c("Date", "Participant", "Right.Wrong"
), class = "data.frame", row.names = c(NA, -19L))
#dplyr
#install.packages('dplyr')
library(dplyr)
participants %>%
mutate(Date = as.POSIXct(Date, "%Y-%m-%d", tz = Sys.timezone())) %>%
group_by(Participant) %>%
dplyr::filter(Date <= as.POSIXct('2015-08-30', "%Y-%m-%d", tz = Sys.timezone())) %>%
summarise(Right.Wrong = mean(Right.Wrong))
# Or base R
participants$Date <- as.POSIXct(participants$Date, "%Y-%m-%d", tz = Sys.timezone())
aggregate(Right.Wrong ~ Participant, data = participants,
subset = participants$Date <= as.POSIXct('2015-08-30', "%Y-%m-%d", tz = Sys.timezone()),
FUN = mean)
这两个都应该产生如下内容:
Participant Right.Wrong
Caroline 0.6666667
Jack 0.6666667
Jennifer 0.3333333
Peter 0.8000000
答案 2 :(得分:0)
您可以使用subset
和aggregate
功能。对于您的数据:
首先,您可以将数据框子集到您想要的日期:
df2<-subset(yourData, yourData$Date < as.Date("2015-08-30"))
其次,您可以看到每个参与者在此日期之前有多少分数:
Points <- aggregate(df2$'Right/Wrong', by=list(df2$Participant), sum)
或者如果你想要平均值:
Points <- aggregate(df2$'Right/Wrong', by=list(df2$Participant), mean)