就像标题所说的那样,我试图从类别的子集中获取信息并将其应用于映射到该类别的新列,而不是整个列。到目前为止,我在df1和df2中看到了我所尝试过的代码,但它并不是我想要的结果(请参阅内联注释以查看问题)
library(ggplot2)
df <- ToothGrowth
df$dose <- as.factor(df$dose)
#takes the minimimum by category 'supp' and applies a subtract to each subset (only half of what I want)
df1 <- ddply(df, .(supp), transform, min1 = len - min(len))
#takes the entire minimum for dose=1 (which is 13.6) and applies a subtraction to the entire column
df2 <- ddply(df, .(supp), transform, min1 = len - min(subset(df,df$dose==1)$len))
我真正想要的是在两个类别中找到剂量= 1的最小值并按类别减去。
因此在supp =&#39; OJ&#39;类别,新的min1列应该将所有值(对于每个剂量)减去14.5,因为它是剂量的最小len = 1。
同样,在supp =&#39; VC&#39;在类别中,新的min1列应该将所有值(对于每个剂量)减去13.6,因为它是剂量的最小len = 1。我想要的结果如下:
> df2
len supp dose min1
1 15.2 OJ 0.5 0.7
2 21.5 OJ 0.5 7.0
3 17.6 OJ 0.5 3.1
4 9.7 OJ 0.5 -4.8
5 14.5 OJ 0.5 0.0
...
31 4.2 VC 0.5 -9.4
32 11.5 VC 0.5 -2.1
33 7.3 VC 0.5 -6.3
34 5.8 VC 0.5 -7.8
35 6.4 VC 0.5 -7.2
答案 0 :(得分:1)
min(subset(df, dose==1)$len)
正在使用&#39; len&#39;的min
。在哪里&#39;剂量&#39;对于整个数据集是1,因此它将是单个值。 (另外,我们不需要将“剂量”转换为factor
)。相反,我们需要获得min
&#39; len&#39;为每个人提供支持。在这种情况下,请移除subset(df,
并使用dose==1
返回逻辑向量,获取相应的&#39; len&#39;,获取min
值并从&#减去39; LEN&#39;
library(plyr)
ddply(df, .(supp), transform, min1 = len - min(len[dose==1]))
# len supp dose min1
#1 15.2 OJ 0.5 0.7
#2 21.5 OJ 0.5 7.0
#3 17.6 OJ 0.5 3.1
#4 9.7 OJ 0.5 -4.8
#5 14.5 OJ 0.5 0.0
#6 10.0 OJ 0.5 -4.5
#7 8.2 OJ 0.5 -6.3
#8 9.4 OJ 0.5 -5.1
#9 16.5 OJ 0.5 2.0
#10 9.7 OJ 0.5 -4.8
#11 19.7 OJ 1.0 5.2
#12 23.3 OJ 1.0 8.8
#13 23.6 OJ 1.0 9.1
#14 26.4 OJ 1.0 11.9
#15 20.0 OJ 1.0 5.5
#16 25.2 OJ 1.0 10.7
#17 25.8 OJ 1.0 11.3
#18 21.2 OJ 1.0 6.7
#19 14.5 OJ 1.0 0.0
#20 27.3 OJ 1.0 12.8
#21 25.5 OJ 2.0 11.0
#22 26.4 OJ 2.0 11.9
#23 22.4 OJ 2.0 7.9
#24 24.5 OJ 2.0 10.0
#25 24.8 OJ 2.0 10.3
#26 30.9 OJ 2.0 16.4
#27 26.4 OJ 2.0 11.9
#28 27.3 OJ 2.0 12.8
#29 29.4 OJ 2.0 14.9
#30 23.0 OJ 2.0 8.5
#31 4.2 VC 0.5 -9.4
#32 11.5 VC 0.5 -2.1
#33 7.3 VC 0.5 -6.3
#34 5.8 VC 0.5 -7.8
#35 6.4 VC 0.5 -7.2
#36 10.0 VC 0.5 -3.6
#37 11.2 VC 0.5 -2.4
#38 11.2 VC 0.5 -2.4
#39 5.2 VC 0.5 -8.4
#40 7.0 VC 0.5 -6.6
#41 16.5 VC 1.0 2.9
#42 16.5 VC 1.0 2.9
#43 15.2 VC 1.0 1.6
#44 17.3 VC 1.0 3.7
#45 22.5 VC 1.0 8.9
#46 17.3 VC 1.0 3.7
#47 13.6 VC 1.0 0.0
#48 14.5 VC 1.0 0.9
#49 18.8 VC 1.0 5.2
#50 15.5 VC 1.0 1.9
#51 23.6 VC 2.0 10.0
#52 18.5 VC 2.0 4.9
#53 33.9 VC 2.0 20.3
#54 25.5 VC 2.0 11.9
#55 26.4 VC 2.0 12.8
#56 32.5 VC 2.0 18.9
#57 26.7 VC 2.0 13.1
#58 21.5 VC 2.0 7.9
#59 23.3 VC 2.0 9.7
#60 29.5 VC 2.0 15.9
或者我们可以使用与dplyr
library(dplyr)
df %>%
group_by(supp) %>%
mutate(min1 = len - min(len[dose==1]))
或data.table
library(data.table)
setDT(df)[, min1:= len - min(len[dose==1]), by = supp]