我正在尝试使用ggplot来制作在6个不同位置和7个不同时间具有底物组成的图。问题是我每个采样周期和站点都有不同的样本量。我基本上想要代码y=freq/(#of stations in that time period)
。以下是我的数据集样本
Substrate Time Site Freq
1 Floc July 11 P1 4
2 Fine July 11 P1 2
3 Medium July 11 P1 12
4 Coarse July 11 P1 0
5 Bedrock July 11 P1 3
6 Floc Aug 11 P1 7
7 Fine Aug 11 P1 1
8 Medium Aug 11 P1 7
9 Coarse Aug 11 P1 1
10 Bedrock Aug 11 P1 4
因此我想要
Var1 Var2 Var3 Freq
1 Floc July 11 P1 4/(21 - The number of samples taken in July).
有关如何编写此代码然后绘制结果的任何想法?
答案 0 :(得分:5)
使用data.table(来自同名的包)......
require(data.table)
DT <- data.table(dat)
DT[,Freq2:=Freq/sum(Freq),by=Var2]
给出了
Var1 Var2 Var3 Freq Freq2
1: Floc July 11 P1 4 0.1904762
2: Fine July 11 P1 2 0.0952381
3: Medium July 11 P1 12 0.5714286
4: Coarse July 11 P1 0 0.0000000
5: Bedrock July 11 P1 3 0.1428571
6: Floc Aug 11 P1 7 0.3500000
7: Fine Aug 11 P1 1 0.0500000
8: Medium Aug 11 P1 7 0.3500000
9: Coarse Aug 11 P1 1 0.0500000
10: Bedrock Aug 11 P1 4 0.2000000
编辑:现在问题有更好的列名,所以更明确的是“for ... period and site”的含义。正如@DWin在评论中写道,答案现在是:
DT[,Freq2:=Freq/sum(Freq),by='Time,Site']
答案 1 :(得分:3)
查看?ave
:
df <- read.table(textConnection("
Var0 Var1 Var2 Var3 Freq
1 Floc July 11 P1 4
2 Fine July 11 P1 2
3 Medium July 11 P1 12
4 Coarse July 11 P1 0
5 Bedrock July 11 P1 3
6 Floc Aug 11 P1 7
7 Fine Aug 11 P1 1
8 Medium Aug 11 P1 7
9 Coarse Aug 11 P1 1
10 Bedrock Aug 11 P1 4"), header=TRUE, row.names=1)
df$freq <- ave(df$Freq, df$Var1, FUN=function(x)x/sum(x))
df
# Var0 Var1 Var2 Var3 Freq freq
#1 Floc July 11 P1 4 0.1904762
#2 Fine July 11 P1 2 0.0952381
#3 Medium July 11 P1 12 0.5714286
#4 Coarse July 11 P1 0 0.0000000
#5 Bedrock July 11 P1 3 0.1428571
#6 Floc Aug 11 P1 7 0.3500000
#7 Fine Aug 11 P1 1 0.0500000
#8 Medium Aug 11 P1 7 0.3500000
#9 Coarse Aug 11 P1 1 0.0500000
#10 Bedrock Aug 11 P1 4 0.2000000