用R进行数据转换

时间:2013-11-12 08:39:38

标签: r transformation reshape

这是我的原始输出。

BizDate     Description                             TotalAmount TotalQty
2013-09-01 NIN JIOM COUGH SYRUP 75ML                11.8        2
2013-09-01 BREACOL COUGH SYRUP ADULTS 120ML         15.8        2
2013-09-02 AFRICAN SEA-COCONUT COUGH SYRUP 177ML    8.5         1
2013-09-03 AFRICAN SEA COCONUT COUGH SYRUP 177ML    8.2         1
2013-09-01 THREE LEGS COOLING WATER 200ML           21.0        14
2013-09-01 SEAHORSE BRAND COOLING WATER 200ML       4.4         4
2013-09-05 AFRICAN SEA-COCONUT COUGH SYRUP 177ML    8.5         1
2013-09-06 AFRICAN SEA COCONUT COUGH SYRUP 177ML    8.2         1

我如何将其转换为类似的东西

 Description                              2013-09-01   2013-09-02 .......
 NIN JIOM COUGH SYRUP 75ML                some number some number .......
 BREACOL COUGH SYRUP ADULTS 120ML         some number some number .......
 AFRICAN SEA-COCONUT COUGH SYRUP 177ML    some number some number .......
 AFRICAN SEA COCONUT COUGH SYRUP 177ML    some number some number .......
 THREE LEGS COOLING WATER 200ML           some number some number .......
 SEAHORSE BRAND COOLING WATER 200ML       some number some number .......
 AFRICAN SEA-COCONUT COUGH SYRUP 177ML    some number some number .......
 AFRICAN SEA COCONUT COUGH SYRUP 177ML    some number some number .......

这是一些假数据。

set.seed(1)
dat <- data.frame(product = rep(letters[1:4], each = 4)
    , date = sample(seq(as.Date("2013-01-01"), as.Date("2013-01-10"), by = 1), 16, TRUE)
    , amount = sample(1:100, 16, TRUE)
    , qty = sample(1:4, 16, TRUE))

dat
   product       date amount qty
1        a 2013-01-03     72   2
2        a 2013-01-04    100   1
3        a 2013-01-06     39   4
4        a 2013-01-10     78   3
5        b 2013-01-03     94   4
6        b 2013-01-09     22   1
7        b 2013-01-10     66   3
8        b 2013-01-07     13   2
9        c 2013-01-07     27   4
10       c 2013-01-01     39   3
11       c 2013-01-03      2   4
12       c 2013-01-02     39   3
13       d 2013-01-07     87   3
14       d 2013-01-04     35   4
15       d 2013-01-08     49   1
16       d 2013-01-05     60   2

我这样做的愚蠢方式。这样做有什么聪明之处吗?我想用plyr。只是指引我到正确的包裹或方式。

dat1<-subset(dat,date=="2013-01-01")
colnames(dat1)[c(3,4)]<-c("2013-01-01.amount","2013-01-01.qty")
dat1<-dat1[-2]
dat2<-subset(dat,date=="2013-01-02")
colnames(dat2)[c(3,4)]<-c("2013-01-02.amount","2013-01-02.qty")
dat2<-dat2[-2]
dat3<-subset(dat,date=="2013-01-03")
colnames(dat3)[c(3,4)]<-c("2013-01-03.amount","2013-01-03.qty")
dat3<-dat3[-2]

new.dat<-merge(dat1,dat2,by="product",all.x=T,all.y=T)
new.dat<-merge(new.dat,dat3,by="product",all.x=T,all.y=T)
new.dat[is.na(new.dat)]<-0

new.dat
  product 2013-01-01.amount 2013-01-01.qty 2013-01-02.amount 2013-01-02.qty
1       a                 0              0                 0              0
2       b                 0              0                 0              0
3       c                39              3                39              3
  2013-01-03.amount 2013-01-03.qty
1                72              2
2                94              4
3                 2              4

2 个答案:

答案 0 :(得分:4)

考虑到当前的数据形式,只需使用基础R中的reshape

> reshape(dat, direction = "wide", idvar="product", timevar="date")
   product amount.2013-01-03 qty.2013-01-03 amount.2013-01-04 qty.2013-01-04 amount.2013-01-06
1        a                72              2               100              1                39
5        b                94              4                NA             NA                NA
9        c                 2              4                NA             NA                NA
13       d                NA             NA                35              4                NA
   qty.2013-01-06 amount.2013-01-10 qty.2013-01-10 amount.2013-01-09 qty.2013-01-09
1               4                78              3                NA             NA
5              NA                66              3                22              1
9              NA                NA             NA                NA             NA
13             NA                NA             NA                NA             NA
   amount.2013-01-07 qty.2013-01-07 amount.2013-01-01 qty.2013-01-01 amount.2013-01-02
1                 NA             NA                NA             NA                NA
5                 13              2                NA             NA                NA
9                 27              4                39              3                39
13                87              3                NA             NA                NA
   qty.2013-01-02 amount.2013-01-08 qty.2013-01-08 amount.2013-01-05 qty.2013-01-05
1              NA                NA             NA                NA             NA
5              NA                NA             NA                NA             NA
9               3                NA             NA                NA             NA
13             NA                49              1                60              2

或者,您可以考虑使用“reshape2”包:

library(reshape2)
datL <- melt(dat, id.vars=c("product", "date"))
dcast(datL, product ~ date + variable, value.var="value")

答案 1 :(得分:3)

试试这个:

#Dummy data
df <- read.table(text="
BizDate Description TotalAmount TotalQty
2013-09-01 NIN 11.8 2
2013-09-01 BREACOL 15.8 2
2013-09-02 AFRICAN 8.5 1
2013-09-03 AFRICAN 8.2 1
2013-09-01 THREE 21.0 14
2013-09-01 SEAHORSE 4.4 4
2013-09-05 AFRICAN 8.5 1
2013-09-06 AFRICAN 8.2 1",
                 header=TRUE)

library(reshape2)

dcast(df,Description~BizDate,value="TotalAmount")
#Description 2013-09-01 2013-09-02 2013-09-03 2013-09-05 2013-09-06
#1     AFRICAN         NA        8.5        8.2        8.5        8.2
#2     BREACOL       15.8         NA         NA         NA         NA
#3         NIN       11.8         NA         NA         NA         NA
#4    SEAHORSE        4.4         NA         NA         NA         NA
#5       THREE       21.0         NA         NA         NA         NA
dcast(df,Description~BizDate,value="TotalQty")
#Description 2013-09-01 2013-09-02 2013-09-03 2013-09-05 2013-09-06
#1     AFRICAN         NA          1          1          1          1
#2     BREACOL          2         NA         NA         NA         NA
#3         NIN          2         NA         NA         NA         NA
#4    SEAHORSE          4         NA         NA         NA         NA
#5       THREE         14         NA         NA         NA         NA