我试图遍历列的数据框,并将计算结果存入矩阵。
可以使用以下示例数据复制方案:
df = data.frame(replicate(10,sample(0:20,10,rep=TRUE))) # the columns to be calculated on
M1 = as.data.frame(matrix(0, nrow = 10, ncol = 10)) # a matrix to hold the results.
rownames(M1) = colnames(df)
colnames(M1) = colnames(df)
如下所示:
> df # Frame with columns of data, X1 to X10
X1 X2 X3 X4 X5 X6 X7 X8 X9 X10
1 1 19 2 6 6 5 0 2 5 10
2 16 7 14 16 16 18 11 2 18 11
3 7 6 11 4 4 1 18 11 10 16
4 20 2 4 20 4 6 10 5 16 7
5 9 8 16 19 11 2 14 7 13 7
6 5 16 6 8 20 15 5 11 4 0
7 11 16 12 8 18 20 20 20 10 14
8 17 14 10 4 3 10 13 11 5 1
9 9 20 10 5 1 7 12 10 5 6
10 8 14 3 14 20 10 17 20 9 14
> M1 # Output frame to hold results
X1 X2 X3 X4 X5 X6 X7 X8 X9 X10
X1 0 0 0 0 0 0 0 0 0 0
X2 0 0 0 0 0 0 0 0 0 0
X3 0 0 0 0 0 0 0 0 0 0
X4 0 0 0 0 0 0 0 0 0 0
X5 0 0 0 0 0 0 0 0 0 0
X6 0 0 0 0 0 0 0 0 0 0
X7 0 0 0 0 0 0 0 0 0 0
X8 0 0 0 0 0 0 0 0 0 0
X9 0 0 0 0 0 0 0 0 0 0
X10 0 0 0 0 0 0 0 0 0 0
在df
列中,X1和X2进行计算,然后是X1和X3,然后是X1和X4等。然后循环将循环X2和X3,然后是X2和X4等。
列 n 和 m 被输入到计算/循环中,结果应该放在矩阵中与列 n <对应的适当位置/ em> x m 。计算本身简单地将Xn和Xm之间的区域确定为绘制线。我不确定如何正确构造循环来执行此操作:
# The first iteration of the calculation, column X1 and X2 (X1 and X1 would = 0)
y = seq(1,10,1)
f1 = approxfun(y, df[,1] - df[,2]) # takes two columns as inputs
f2 = function(x) abs(f1(x))
area1 = integrate(f2, 1, 10, subdivisions = 500)
M1[2,1] = area1$value
结果帧会产生一个“半矩阵”(这就是所需的全部,因为镜像的一半是相同的):
> M1
X1 X2 X3 X4 X5 X6 X7 X8 X9 X10
X1 0 0 0 0 0 0 0 0 0 0
X2 A 0 0 0 0 0 0 0 0 0
X3 A A 0 0 0 0 0 0 0 0
X4 A A A 0 0 0 0 0 0 0
X5 A A A A 0 0 0 0 0 0
X6 A A A A A 0 0 0 0 0
X7 A A A A A A 0 0 0 0
X8 A A A A A A A 0 0 0
X9 A A A A A A A A 0 0
X10 A A A A A A A A A 0
我开始构建一个for循环但是我正在踩着使用i和j来保持X1直到它已经循环通过X2-X10,然后继续前进到X2等等。
谢谢!
答案 0 :(得分:1)
我无法让你的功能发挥作用。因此,通过使用随机替换函数,此循环适用于我:
area=list() # because the actual function doesn't work
for(i in 1:ncol(df)){
for(j in 1:ncol(df)){
if(i==j){M[i,i]=0;next}
selection=df[,c(i,j)]
#area=integrate(f2, 1, 200, subdivisions = 500)
area$value=mean(colSums(selection)) # something random to check
M[i,j]=area$value
M[j,i]=area$value
}
}
但循环通常不是最有效的做事方式。因此,您可能更喜欢这个选项:
df = data.frame(replicate(10,sample(0:20,10,rep=TRUE))) # the columns to be calculated on
my.f = function(x) abs(x[,1]-x[,2])
#y = t(as.matrix(combn(ncol(df), 2L, function(y) integrate(my.f(df[y]), 1, 200, subdivisions = 500),simplify=F))) # This doesn't work, but should be close to what you want to do
y = t(as.matrix(combn(ncol(df), 2L, function(y) mean(f(df[y]),simplify=F)))) # this works, but is just an example
N = seq_len(ncol(y))
nams = colnames(df)
M = matrix(ncol = length(nams), nrow = length(nams))
M[lower.tri(M)] = y
M = t(out)
M[lower.tri(M)] = y
M = t(M)
diag(M) = 0
rownames(M) = colnames(out) = colnames(df)
M
X1 X2 X3 X4 X5 X6 X7 X8 X9 X10
X1 0.0 8.6 6.4 8.8 7.1 6.6 7.0 4.0 7.0 3.7
X2 8.6 0.0 5.0 4.4 5.5 5.4 4.4 9.2 8.0 7.7
X3 6.4 5.0 0.0 7.2 5.9 5.8 7.6 7.0 10.4 6.5
X4 8.8 4.4 7.2 0.0 5.9 4.4 5.4 9.6 8.4 7.3
X5 7.1 5.5 5.9 5.9 0.0 7.3 5.3 9.1 8.5 8.0
X6 6.6 5.4 5.8 4.4 7.3 0.0 6.0 8.4 5.6 3.7
X7 7.0 4.4 7.6 5.4 5.3 6.0 0.0 8.8 4.4 5.7
X8 4.0 9.2 7.0 9.6 9.1 8.4 8.8 0.0 9.6 6.9
X9 7.0 8.0 10.4 8.4 8.5 5.6 4.4 9.6 0.0 5.5
X10 3.7 7.7 6.5 7.3 8.0 3.7 5.7 6.9 5.5 0.0