R编程:修改时避免复制整个数据帧?

时间:2013-08-22 15:14:36

标签: r dataframe

似乎R在修改数据框中的一个条目时复制整个数据帧。我想知道是否有办法让R只是复制相应的数据列(例如下面的特定INTSXP而不是VECSXP)来维护复制变更策略?还有办法对数据帧进行现场修改吗?

> x<-data.frame(x=1:1000000,y=1:1000000)
> .Internal(inspect(x))
@62cd2b0 19 VECSXP g0c2 [OBJ,MARK,NAM(2),ATT] (len=2, tl=0)
  @f80d0e0 13 INTSXP g0c7 [MARK] (len=1000000, tl=0) 1,2,3,4,5,...
  @8ed6970 13 INTSXP g0c7 [] (len=1000000, tl=0) 1,2,3,4,5,...
ATTRIB:
  @68f6b40 02 LISTSXP g0c0 []
    TAG: @4e58868 01 SYMSXP g1c0 [MARK,LCK,gp=0x4000] "names" (has value)
    @613efd0 16 STRSXP g0c2 [] (len=2, tl=0)
      @4e93038 09 CHARSXP g1c1 [MARK,gp=0x61] [ASCII] [cached] "x"
      @4fe8bd8 09 CHARSXP g1c1 [MARK,gp=0x61] [ASCII] [cached] "y"
    TAG: @4e62650 01 SYMSXP g1c0 [MARK,LCK,gp=0x4000] "row.names" (has value)
    @113bb328 13 INTSXP g0c1 [] (len=2, tl=0) -2147483648,-1000000
    TAG: @4e58d38 01 SYMSXP g1c0 [MARK,LCK,gp=0x4000] "class" (has value)
    @113aa1d8 16 STRSXP g0c1 [MARK,NAM(2)] (len=1, tl=0)
      @4ee78a0 09 CHARSXP g1c2 [MARK,gp=0x61] [ASCII] [cached] "data.frame"
> x[1,1]<-3L
>  .Internal(inspect(x))
@68eb9f8 19 VECSXP g0c2 [OBJ,NAM(2),ATT] (len=2, tl=0)
  @6507290 13 INTSXP g0c7 [] (len=1000000, tl=0) 3,2,3,4,5,...
  @7422920 13 INTSXP g0c7 [] (len=1000000, tl=0) 1,2,3,4,5,...
ATTRIB:
  @68ef738 02 LISTSXP g0c0 []
    TAG: @4e58868 01 SYMSXP g1c0 [MARK,LCK,gp=0x4000] "names" (has value)
    @68ebaa0 16 STRSXP g0c2 [NAM(2)] (len=2, tl=0)
      @4e93038 09 CHARSXP g1c1 [MARK,gp=0x61] [ASCII] [cached] "x"
      @4fe8bd8 09 CHARSXP g1c1 [MARK,gp=0x61] [ASCII] [cached] "y"
    TAG: @4e62650 01 SYMSXP g1c0 [MARK,LCK,gp=0x4000] "row.names" (has value)
    @f43c418 13 INTSXP g0c1 [] (len=2, tl=0) -2147483648,-1000000
    TAG: @4e58d38 01 SYMSXP g1c0 [MARK,LCK,gp=0x4000] "class" (has value)
    @f43c4d8 16 STRSXP g0c1 [NAM(1)] (len=1, tl=0)
      @4ee78a0 09 CHARSXP g1c2 [MARK,gp=0x61] [ASCII] [cached] "data.frame"

谢谢!

2 个答案:

答案 0 :(得分:7)

您应该使用data.table来完成此目的。并阅读this reference post

R) dt<-data.table(x=1:10,y=1:10)
R) .Internal(inspect(dt))
@0x000000000dce56c0 19 VECSXP g0c7 [OBJ,NAM(1),ATT] (len=2, tl=100)
  @0x000000000ebc4100 13 INTSXP g0c4 [NAM(2)] (len=10, tl=0) 1,2,3,4,5,...
  @0x000000000ebc41b0 13 INTSXP g0c4 [NAM(2)] (len=10, tl=0) 1,2,3,4,5,...
ATTRIB:
  @0x000000000e6c2d00 02 LISTSXP g0c0 [] 
    TAG: @0x00000000003b0088 01 SYMSXP g1c0 [MARK,LCK,gp=0x4000] "names" (has value)
    @0x000000000cc99fd0 16 STRSXP g0c7 [NAM(2)] (len=2, tl=100)
      @0x00000000003ddbb8 09 CHARSXP g1c1 [MARK,gp=0x61] [ASCII] [cached] "x"
      @0x000000000734f4d8 09 CHARSXP g1c1 [MARK,gp=0x61] [ASCII] [cached] "y"
    TAG: @0x00000000003b1d98 01 SYMSXP g1c0 [MARK,LCK,gp=0x4000] "row.names" (has value)
    @0x0000000014487f98 13 INTSXP g0c1 [] (len=2, tl=0) -2147483648,-10
    TAG: @0x00000000003b0558 01 SYMSXP g1c0 [MARK,LCK,gp=0x4000] "class" (has value)
    @0x000000000ead1910 16 STRSXP g0c2 [] (len=2, tl=0)
      @0x000000000753f440 09 CHARSXP g1c2 [MARK,gp=0x61] [ASCII] [cached] "data.table"
      @0x000000000715f398 09 CHARSXP g1c2 [MARK,gp=0x61,ATT] [ASCII] [cached] "data.frame"
    TAG: @0x000000000c3d7cc0 01 SYMSXP g1c0 [MARK] ".internal.selfref"
    @0x000000000e6c1e80 22 EXTPTRSXP g0c0 [] 
R) dt[,y:=y+1]
R) .Internal(inspect(dt))
@0x000000000dce56c0 19 VECSXP g0c7 [OBJ,NAM(2),ATT] (len=2, tl=100)
  @0x000000000ebc4100 13 INTSXP g0c4 [NAM(2)] (len=10, tl=0) 1,2,3,4,5,...
  @0x000000000ebc6728 14 REALSXP g0c6 [NAM(1)] (len=10, tl=0) 2,3,4,5,6,...
ATTRIB:
  @0x000000000e6c2d00 02 LISTSXP g0c0 [] 
    TAG: @0x00000000003b0088 01 SYMSXP g1c0 [MARK,LCK,gp=0x4000] "names" (has value)
    @0x000000000cc99fd0 16 STRSXP g0c7 [NAM(2)] (len=2, tl=100)
      @0x00000000003ddbb8 09 CHARSXP g1c1 [MARK,gp=0x61] [ASCII] [cached] "x"
      @0x000000000734f4d8 09 CHARSXP g1c1 [MARK,gp=0x61] [ASCII] [cached] "y"
    TAG: @0x00000000003b1d98 01 SYMSXP g1c0 [MARK,LCK,gp=0x4000] "row.names" (has value)
    @0x0000000014487f98 13 INTSXP g0c1 [] (len=2, tl=0) -2147483648,-10
    TAG: @0x00000000003b0558 01 SYMSXP g1c0 [MARK,LCK,gp=0x4000] "class" (has value)
    @0x000000000ead1910 16 STRSXP g0c2 [NAM(2)] (len=2, tl=0)
      @0x000000000753f440 09 CHARSXP g1c2 [MARK,gp=0x61] [ASCII] [cached] "data.table"
      @0x000000000715f398 09 CHARSXP g1c2 [MARK,gp=0x61,ATT] [ASCII] [cached] "data.frame"
    TAG: @0x000000000c3d7cc0 01 SYMSXP g1c0 [MARK] ".internal.selfref"
    @0x000000000e6c1e80 22 EXTPTRSXP g0c0 [] 

答案 1 :(得分:4)

@stat_quant是正确的,data.table是要走的路。

但是data.table有一个set函数,它也会修改data.frames修改 延长

使用较小的示例

x <- data.frame(x = 1:10, y = 1:10)
# @0x00000000120dbca8 19 VECSXP g0c2 [OBJ,NAM(2),ATT] (len=2, tl=0)
#   @0x0000000011631328 13 INTSXP g0c4 [] (len=10, tl=0) 1,2,3,4,5,...
#   @0x0000000011631380 13 INTSXP g0c4 [] (len=10, tl=0) 1,2,3,4,5,...
# ATTRIB:
#   @0x0000000020964420 02 LISTSXP g0c0 [] 
#   TAG: @0x0000000000330088 01 SYMSXP g1c0 [MARK,NAM(2),LCK,gp=0x4000] "names" (has value)
#   @0x00000000120dafe8 16 STRSXP g0c2 [NAM(1)] (len=2, tl=0)
#     @0x000000000037dc60 09 CHARSXP g1c1 [MARK,gp=0x61] [ASCII] [cached] "x"
#     @0x0000000008dc35c0 09 CHARSXP g1c1 [MARK,gp=0x61] [ASCII] [cached] "y"
#   TAG: @0x0000000000331d98 01 SYMSXP g1c0 [MARK,LCK,gp=0x4000] "row.names" (has value)
#   @0x000000001165e700 13 INTSXP g0c1 [] (len=2, tl=0) -2147483648,-10
#   TAG: @0x0000000000330558 01 SYMSXP g1c0 [MARK,LCK,gp=0x4000] "class" (has value)
#   @0x000000001165e730 16 STRSXP g0c1 [NAM(2)] (len=1, tl=0)
#     @0x00000000003fc4a0 09 CHARSXP g1c2 [MARK,gp=0x61,ATT] [ASCII] [cached] "data.frame"
.Internal(inspect(x))
set(x,i=1L,j=1L,value = 3L)
.Internal(inspect(x))
# @0x00000000120dbca8 19 VECSXP g0c2 [OBJ,NAM(2),ATT] (len=2, tl=0)
#   @0x0000000011631328 13 INTSXP g0c4 [] (len=10, tl=0) 3,2,3,4,5,...
#   @0x0000000011631380 13 INTSXP g0c4 [] (len=10, tl=0) 1,2,3,4,5,...
# ATTRIB:
#   @0x0000000020964420 02 LISTSXP g0c0 [] 
#   TAG: @0x0000000000330088 01 SYMSXP g1c0 [MARK,NAM(2),LCK,gp=0x4000] "names" (has value)
#   @0x00000000120dafe8 16 STRSXP g0c2 [NAM(2)] (len=2, tl=0)
#     @0x000000000037dc60 09 CHARSXP g1c1 [MARK,gp=0x61] [ASCII] [cached] "x"
#   @0x0000000008dc35c0 09 CHARSXP g1c1 [MARK,gp=0x61] [ASCII] [cached] "y"
#   TAG: @0x0000000000331d98 01 SYMSXP g1c0 [MARK,LCK,gp=0x4000] "row.names" (has value)
#   @0x000000001165e700 13 INTSXP g0c1 [] (len=2, tl=0) -2147483648,-10
#   TAG: @0x0000000000330558 01 SYMSXP g1c0 [MARK,LCK,gp=0x4000] "class" (has value)
#   @0x000000001165e730 16 STRSXP g0c1 [NAM(2)] (len=1, tl=0)
#     @0x00000000003fc4a0 09 CHARSXP g1c2 [MARK,gp=0x61,ATT] [ASCII] [cached] "data.frame"
head(x)
#   x y
# 1 3 1
# 2 2 2
# 3 3 3
# 4 4 4
# 5 5 5
# 6 6 6

使用set,您可以更改data.frame的现有列中的值,您无法添加列(就像:=data.table一样)