以下是一些代码,探讨了分配给数组中的单元格所导致的额外复制(在本例中使用for
循环)。
# populate a vector with a million random numbers
n = 10^6
v=runif(n)
# vectorized version: fast
vv<-v*v;
m<-mean(vv); m
# for loop: slow
tracemem(vv)
for(i in 1:length(v)) { vv[i]<-v[i]*v[i] };
m<-mean(vv); m
输出
> vv<-v*v;
> m<-mean(vv); m
[1] 0.3329162
> # for loop: slow
> tracemem(vv)
[1] "<0x000007ffff560010"
> for(i in 1:length(v)) { vv[i]<-v[i]*v[i] };
tracemem[0x000007ffff560010 -> 0x000007fffe570010]:
> m<-mean(vv); m
[1] 0.3329162
这似乎表明在循环的第一次迭代中有一个向量的副本。
注意:这是对我之前问题Why is vectorization faster,this answer to it和this comment on the answer的跟进。
为了确认复制,我在循环体外进行了第一次迭代
v=runif(n)
# vectorized version: fast
vv<-v*v;
m<-mean(vv); m
# for loop: slow
tracemem(vv)
vv[1]<-v[1]*v[1]
tracemem(vv)
for(i in 2:length(v)) { vv[i]<-v[i]*v[i] };
m<-mean(vv); m
给出了这个输出
> vv<-v*v;
> m<-mean(vv); m
[1] 0.33385
> # for loop: slow
> tracemem(vv)
[1] "<0x000007fffef80010"
> vv[1]<-v[1]*v[1]
tracemem[0x000007fffef80010 -> 0x000007fffddc0010]:
> tracemem(vv)
[1] "<0x000007fffddc0010"
> for(i in 2:length(v)) { vv[i]<-v[i]*v[i] };
> m<-mean(vv); m
[1] 0.33385 # (different as I generated the random nos again)
在阅读了joran的答案和this nabble discussion thread后,我开始熟悉R潜在复制载体的想法,例如:当您更改类型
时> x = 1:10
> tracemem(x)
[1] "<0x00000000118ba4e0"
> x[5] = 6
tracemem[0x00000000118ba4e0 -> 0x0000000010d03568]:
> x = 1:10 # starts off as integer
> tracemem(x)
[1] "<0x00000000118ba538"
> x[5] = 6L # setting integer ok
> x[5] = 6 # setting floating point changes type
tracemem[0x00000000118ba538 -> 0x0000000010d03568]:
> x[6] = 7 # it's now floating point, setting floating point again ok
> x[7] = "asdf" # setting string changes type once more, this tanks on a large array
tracemem[0x0000000010d03568 -> 0x0000000010d03610]:
所以我粗略地了解发生了什么,但为什么在我的第一个例子中有vv
的副本(或者我在解释中犯了什么错误),当vv
已经是浮点数组?
答案 0 :(得分:4)
复制是因为R认为可能存在对该对象的另一个引用:
x <- 1:10
.Internal(inspect(x))
## @5a27838 13 INTSXP g0c4 [NAM(1)] (len=10, tl=0) 1,2,3,4,5,...
# NAM(1) means that there is one reference to the object.
tracemem(x)
## [1] "<0x05a27838>"
.Internal(inspect(x))
## @5a27838 13 INTSXP g0c4 [NAM(1),TR] (len=10, tl=0) 1,2,3,4,5,...
# Still one reference
mean(x)
## [1] 5.5
.Internal(inspect(x))
## @5a27838 13 INTSXP g0c4 [NAM(2),TR] (len=10, tl=0) 1,2,3,4,5,...
# NAM(2) means "more than one" reference.
# A copy of the "pointer" was taken to pass to "mean", which bumped the count.
# The count starts at (essentially) 1, and is set to 2 if a copy is made. Never back to 1 though.
x[1] <- 0
tracemem[0x05a27838 -> 0x05a278c8]:
tracemem[0x05a278c8 -> 0x05a0d6f0]:
分配实际上不会复制数据(直到进行修改)。相反,它会复制指针并指示没有单例:
x <- 1
y <- x
.Internal(inspect(x))
## @5a61848 14 REALSXP g0c1 [NAM(2)] (len=1, tl=0) 1
.Internal(inspect(y))
## @5a61848 14 REALSXP g0c1 [NAM(2)] (len=1, tl=0) 1
y[1] <- 1
.Internal(inspect(y))
## @5a61948 14 REALSXP g0c1 [NAM(1)] (len=1, tl=0) 1
# Note, a new memory address, and NAM(1).