R编程中的旋转 - 将数据框转换为旋转熔化/铸造

时间:2014-02-02 01:48:30

标签: r csv pivot reshape melt

我是R的新手,我一直试图转动我从CSV文件中读取的数据框。原始CSV包含5,000个项目编号,在我的示例中,我使用了前五个。我使用pivot的最终结果应该显示每个项目编号,与付款和付款类型一样多。例如,原始表格如下所示:

ITEM NUMBER P1      P2      P3      P4      PType1  PType2      PType3  PType4
697884      270     255     170     0       CASH    CA      VI  
697885      100     1160    310     580     CASH    AX      VI          CA
697886      1515    1455    1765    970     CASH    AX      VI          CA
697887      0       0       0       0               
697888      1755    3610    1950    0       AX          VI      CA

通过使用pivot我想得到一个这样的表:

ITEM NUMBER Payment    PaymentType  
697884           270         CASH
697884           255         CA
697884           170         VI

......(下一个项目)

我当前的数据框包含9个变量,其中项目编号为NUM,付款金额为int,付款类型为Factor。 谢谢!

structure(list(ITEM.NUMBER = 697884:697888, Payment1 = c(270L, 
100L, 1515L, 0L, 1755L), Payment2 = c(255L, 1160L, 1455L, 0L, 
3610L), Payment3 = c(170L, 310L, 1765L, 0L, 1950L), Payment4 = c(0L, 
580L, 970L, 0L, 0L), PaymentType1 = structure(c(3L, 3L, 3L, 1L, 
2L), .Label = c("", "AX", "CASH"), class = "factor"), PaymentType2 = structure(c(3L, 
2L, 2L, 1L, 4L), .Label = c("", "AX", "CA", "VI"), class = "factor"), 
    PaymentType3 = structure(c(3L, 3L, 3L, 1L, 2L), .Label = c("", 
    "CA", "VI"), class = "factor"), PaymentType4 = structure(c(1L, 
    2L, 2L, 1L, 1L), .Label = c("", "CA"), class = "factor")), .Names = c("ITEM.NUMBER", 
"Payment1", "Payment2", "Payment3", "Payment4", "PaymentType1", 
"PaymentType2", "PaymentType3", "PaymentType4"), row.names = c(NA, 
-5L), class = "data.frame")

1 个答案:

答案 0 :(得分:0)

您可以使用基数为R的reshape。假设您的数据名为“mydf”:

reshape(mydf, direction = "long", idvar="ITEM.NUMBER", 
        varying=2:ncol(mydf), sep = "")
#          ITEM.NUMBER time Payment PaymentType
# 697884.1      697884    1     270        CASH
# 697885.1      697885    1     100        CASH
# 697886.1      697886    1    1515        CASH
# 697887.1      697887    1       0            
# 697888.1      697888    1    1755          AX
# 697884.2      697884    2     255          CA
# 697885.2      697885    2    1160          AX
# 697886.2      697886    2    1455          AX
# 697887.2      697887    2       0            
# 697888.2      697888    2    3610          VI
# 697884.3      697884    3     170          VI
# 697885.3      697885    3     310          VI
# 697886.3      697886    3    1765          VI
# 697887.3      697887    3       0            
# 697888.3      697888    3    1950          CA
# 697884.4      697884    4       0            
# 697885.4      697885    4     580          CA
# 697886.4      697886    4     970          CA
# 697887.4      697887    4       0            
# 697888.4      697888    4       0

如果您想通过“ITEM.NUMBER”订购,可以使用order

out <- reshape(mydf, direction = "long", idvar="ITEM.NUMBER", 
               varying=2:ncol(mydf), sep = "")
out[order(out$ITEM.NUMBER), ]

更新

为了完整起见,这是我提出的reshape2方法:

首先,melt数据(如评论中所示):

mydfL <- melt(mydf, id.vars="ITEM.NUMBER")
head(mydfL)
#   ITEM.NUMBER variable value
# 1      697884 Payment1   270
# 2      697885 Payment1   100
# 3      697886 Payment1  1515
# 4      697887 Payment1     0
# 5      697888 Payment1  1755
# 6      697884 Payment2   255

其次,拆分“变量”列。可能有更好的方法来做到这一点,但这就是我想到的。

mydfL <- cbind(mydfL, do.call(rbind, strsplit(
  as.character(mydfL$variable), split = "(?<=[a-zA-Z])(?=[0-9])", perl = T)))
head(mydfL)
#   ITEM.NUMBER variable value       1 2
# 1      697884 Payment1   270 Payment 1
# 2      697885 Payment1   100 Payment 1
# 3      697886 Payment1  1515 Payment 1
# 4      697887 Payment1     0 Payment 1
# 5      697888 Payment1  1755 Payment 1
# 6      697884 Payment2   255 Payment 2

第三,使用dcast获取输出。由于某些列被命名为“1”和“2”,因此您需要使用反引号(`)来引用它们并让R将它们识别为列名而不是值。

dcast(mydfL, ITEM.NUMBER + `2` ~ `1`, value.var="value")
#    ITEM.NUMBER 2 Payment PaymentType
# 1       697884 1     270        CASH
# 2       697884 2     255          CA
# 3       697884 3     170          VI
# 4       697884 4       0            
# 5       697885 1     100        CASH
# 6       697885 2    1160          AX
# 7       697885 3     310          VI
# 8       697885 4     580          CA
# 9       697886 1    1515        CASH
# 10      697886 2    1455          AX
# 11      697886 3    1765          VI
# 12      697886 4     970          CA
# 13      697887 1       0            
# 14      697887 2       0            
# 15      697887 3       0            
# 16      697887 4       0            
# 17      697888 1    1755          AX
# 18      697888 2    3610          VI
# 19      697888 3    1950          CA
# 20      697888 4       0