我是R的新手,我一直试图转动我从CSV文件中读取的数据框。原始CSV包含5,000个项目编号,在我的示例中,我使用了前五个。我使用pivot的最终结果应该显示每个项目编号,与付款和付款类型一样多。例如,原始表格如下所示:
ITEM NUMBER P1 P2 P3 P4 PType1 PType2 PType3 PType4
697884 270 255 170 0 CASH CA VI
697885 100 1160 310 580 CASH AX VI CA
697886 1515 1455 1765 970 CASH AX VI CA
697887 0 0 0 0
697888 1755 3610 1950 0 AX VI CA
通过使用pivot我想得到一个这样的表:
ITEM NUMBER Payment PaymentType
697884 270 CASH
697884 255 CA
697884 170 VI
......(下一个项目)
我当前的数据框包含9个变量,其中项目编号为NUM,付款金额为int,付款类型为Factor。 谢谢!
structure(list(ITEM.NUMBER = 697884:697888, Payment1 = c(270L,
100L, 1515L, 0L, 1755L), Payment2 = c(255L, 1160L, 1455L, 0L,
3610L), Payment3 = c(170L, 310L, 1765L, 0L, 1950L), Payment4 = c(0L,
580L, 970L, 0L, 0L), PaymentType1 = structure(c(3L, 3L, 3L, 1L,
2L), .Label = c("", "AX", "CASH"), class = "factor"), PaymentType2 = structure(c(3L,
2L, 2L, 1L, 4L), .Label = c("", "AX", "CA", "VI"), class = "factor"),
PaymentType3 = structure(c(3L, 3L, 3L, 1L, 2L), .Label = c("",
"CA", "VI"), class = "factor"), PaymentType4 = structure(c(1L,
2L, 2L, 1L, 1L), .Label = c("", "CA"), class = "factor")), .Names = c("ITEM.NUMBER",
"Payment1", "Payment2", "Payment3", "Payment4", "PaymentType1",
"PaymentType2", "PaymentType3", "PaymentType4"), row.names = c(NA,
-5L), class = "data.frame")
答案 0 :(得分:0)
您可以使用基数为R的reshape
。假设您的数据名为“mydf”:
reshape(mydf, direction = "long", idvar="ITEM.NUMBER",
varying=2:ncol(mydf), sep = "")
# ITEM.NUMBER time Payment PaymentType
# 697884.1 697884 1 270 CASH
# 697885.1 697885 1 100 CASH
# 697886.1 697886 1 1515 CASH
# 697887.1 697887 1 0
# 697888.1 697888 1 1755 AX
# 697884.2 697884 2 255 CA
# 697885.2 697885 2 1160 AX
# 697886.2 697886 2 1455 AX
# 697887.2 697887 2 0
# 697888.2 697888 2 3610 VI
# 697884.3 697884 3 170 VI
# 697885.3 697885 3 310 VI
# 697886.3 697886 3 1765 VI
# 697887.3 697887 3 0
# 697888.3 697888 3 1950 CA
# 697884.4 697884 4 0
# 697885.4 697885 4 580 CA
# 697886.4 697886 4 970 CA
# 697887.4 697887 4 0
# 697888.4 697888 4 0
如果您想通过“ITEM.NUMBER”订购,可以使用order
:
out <- reshape(mydf, direction = "long", idvar="ITEM.NUMBER",
varying=2:ncol(mydf), sep = "")
out[order(out$ITEM.NUMBER), ]
为了完整起见,这是我提出的reshape2
方法:
首先,melt
数据(如评论中所示):
mydfL <- melt(mydf, id.vars="ITEM.NUMBER")
head(mydfL)
# ITEM.NUMBER variable value
# 1 697884 Payment1 270
# 2 697885 Payment1 100
# 3 697886 Payment1 1515
# 4 697887 Payment1 0
# 5 697888 Payment1 1755
# 6 697884 Payment2 255
其次,拆分“变量”列。可能有更好的方法来做到这一点,但这就是我想到的。
mydfL <- cbind(mydfL, do.call(rbind, strsplit(
as.character(mydfL$variable), split = "(?<=[a-zA-Z])(?=[0-9])", perl = T)))
head(mydfL)
# ITEM.NUMBER variable value 1 2
# 1 697884 Payment1 270 Payment 1
# 2 697885 Payment1 100 Payment 1
# 3 697886 Payment1 1515 Payment 1
# 4 697887 Payment1 0 Payment 1
# 5 697888 Payment1 1755 Payment 1
# 6 697884 Payment2 255 Payment 2
第三,使用dcast
获取输出。由于某些列被命名为“1”和“2”,因此您需要使用反引号(`)来引用它们并让R将它们识别为列名而不是值。
dcast(mydfL, ITEM.NUMBER + `2` ~ `1`, value.var="value")
# ITEM.NUMBER 2 Payment PaymentType
# 1 697884 1 270 CASH
# 2 697884 2 255 CA
# 3 697884 3 170 VI
# 4 697884 4 0
# 5 697885 1 100 CASH
# 6 697885 2 1160 AX
# 7 697885 3 310 VI
# 8 697885 4 580 CA
# 9 697886 1 1515 CASH
# 10 697886 2 1455 AX
# 11 697886 3 1765 VI
# 12 697886 4 970 CA
# 13 697887 1 0
# 14 697887 2 0
# 15 697887 3 0
# 16 697887 4 0
# 17 697888 1 1755 AX
# 18 697888 2 3610 VI
# 19 697888 3 1950 CA
# 20 697888 4 0