我在R-studio内置数据库ChickWeight上尝试了一些命令。数据如下所示。
weight Time Chick Diet
1 42 0 1 1
2 51 2 1 1
3 59 4 1 1
4 64 6 1 1
5 76 8 1 1
6 93 10 1 1
7 106 12 1 1
8 125 14 1 1
9 149 16 1 1
10 171 18 1 1
11 199 20 1 1
12 205 21 1 1
13 40 0 2 1
14 49 2 2 1
15 58 4 2 1
现在我想做的是简单地输出鸡肉重量之间的差异" Chick"时间0和21的列(上次时间值)。即小鸡的重量。
我一直在尝试tapply(ChickWeight$weight, ChickWeight$Chick, function(x) x[length(x)] - x[1])
。但这当然会将值应用于所有行。
如何制作它,使其仅对每个独特的Chick值应用一次?
答案 0 :(得分:3)
如果我们每个因素需要一个值,那么'专栏(假设' Chick'和' Diet'是因子列)
library(data.table)
setDT(df1)[, list(Diff= abs(weight[Time==21]-weight[Time==0])) ,.(Chick, Diet)]
如果我们需要创建一个列
setDT(df1)[, Diff:= abs(weight[Time==21]-weight[Time==0]) ,.(Chick, Diet)]
我注意到在小鸡No:2中找不到示例Time = 21
,可能在这种情况下,我们需要其中一个
setDT(df1)[, {tmp <- Time %in% c(0,21)
list(Diff= if(sum(tmp)>1) abs(diff(weight[tmp])) else weight[tmp]) } ,
by = .(Chick, Diet)]
# Chick Diet Diff
#1: 1 1 163
#2: 2 1 40
如果我们正在采取“重量”的差异。基于max
和min
&#39;时间&#39;对于每个小组
setDT(df1)[, list(Diff=weight[which.max(Time)]-
weight[which.min(Time)]), .(Chick, Diet)]
# Chick Diet Diff
#1: 1 1 163
#2: 2 1 18
此外,如果&#39;时间&#39;订购
setDT(df1)[, list(Diff= abs(diff(weight[c(1L,.N)]))), by =.(Chick, Diet)]
使用by
base R
by(df1[1:2], df1[3:4], FUN= function(x) with(x,
abs(weight[which.max(Time)]-weight[which.min(Time)])))
#Chick: 1
#Diet: 1
#[1] 163
#------------------------------------------------------------
#Chick: 2
#Diet: 1
#[1] 18
答案 1 :(得分:2)
以下是使用dplyr
的解决方案:
ChickWeight %>%
group_by(Chick = as.numeric(as.character(Chick))) %>%
summarise(weight_gain = last(weight) - first(weight), final_time = last(Time))
(@ulfelder建议的第一个也是最后一个。)
请注意,ChickWeight$Chick
是一个有序因子,因此如果不将其强制转换为数字,则最终订单看起来很奇怪。
使用基数R:
ChickWeight$Chick <- as.numeric(as.character(ChickWeight$Chick))
tapply(ChickWeight$weight, ChickWeight$Chick, function(x) x[length(x)] - x[1])