使用模式进行多次数据融合

时间:2016-03-05 16:12:42

标签: r data.table

我想重塑以下data.table

library(data.table)
  myfun <- function()  sample(c(NA,round(runif(9)*10)),prob=c(0.2,rep(0.1,9)))
  cheeze <-  myfun()
  bottle <-  myfun()

  df <-  as.data.table(data.frame(ID=LETTERS[1:10],
     bottle_qty=bottle,
     bottle_price=bottle*c(1,3,5),
     cheeze_qty=cheeze,
     cheeze_price=cheeze*c(5,4,2),
     cheeze_cam = 1*(cheeze>4) ,
     cheeze_brie = 1*(cheeze<=4),
     bottle_wine = 1*(bottle>5),
     bottle_beer = 1*(bottle<=5))
     )
 #  ID bottle_qty bottle_price cheeze_qty cheeze_price cheeze_cam cheeze_brie
#  1:  A          7            7          9           45          1           0
#  2:  B          4           12          6           24          1           0
#  3:  C         NA           NA         NA           NA         NA          NA
#  4:  D          7            7          2           10          0           1
#  5:  E          3            9          9           36          1           0
#  6:  F          9           45          4            8          0           1
#  7:  G          6            6          3           15          0           1
#  8:  H          2            6          6           24          1           0
#  9:  I          5           25          8           16          1           0
# 10:  J          7            7          3           15          0           1
#   bottle_wine bottle_beer
#  1:           1           0
#  2:           0           1
#  3:          NA          NA
#  4:           1           0
#  5:           0           1
#  6:           1           0
#  7:           1           0
#  8:           0           1
#  9:           0           1
# 10:           1           0

进入以下内容:

| ID | type        | qty | price |
| A  | cheeze_cam  |   9 |    45 |
| A  | bottle_wine |   7 |     7 |
| B  | bottle_beer |   4 |    12 |
| B  | cheeze_cam  |   6 |    24 |

修改 这是完整的预期产出。

| ID | type        | qty | price |
|----+-------------+-----+-------|
| A  | bottle_wine |   7 |     7 |
| A  | cheeze_cam  |   9 |    45 |
| B  | bottle_beer |   4 |    12 |
| B  | cheeze_cam  |   6 |    24 |
| C  | bottle_wine |  NA |    NA |
| C  | cheeze_brie |  NA |    NA |
| D  | bottle_wine |   7 |     7 |
| D  | cheeze_brie |   2 |    10 |
| E  | bottle_beer |   3 |     9 |
| E  | cheeze_cam  |   9 |    36 |
| F  | bottle_wine |   9 |    45 |
| F  | cheeze_brie |   4 |     8 |
| G  | bottle_wine |   6 |     6 |
| G  | cheeze_brie |   3 |    15 |
| H  | bottle_beer |   2 |     6 |
| H  | cheeze_cam  |   6 |    24 |
| I  | bottle_beer |   5 |    25 |
| I  | cheeze_cam  |   8 |    16 |
| J  | bottle_wine |   7 |     7 |
| J  | cheeze_brie |   3 |    15 |

但找不到x对象。有什么帮助吗?

1 个答案:

答案 0 :(得分:1)

感谢大家的帮助。学分转到@akrun。我只是建立在他的建议之上。

第一次熔化将堆叠所有价格和数量列而无需任何其他考虑因素。因此我们有两列,一个数量和价格。第一次熔化的行数应该是原始表中行数的两倍。在此过程中,我们创建了变量var。 var1代表cheeze,var2代表瓶子。

第一次熔化完成后,剩下的就很简单了。我们只需要融化规格。并使用var来清理表以获得所需的规范。

melt(melt(df, measure=patterns("qty$", "price$"), value.name=c('qty', 'price'), variable.name="var", na.rm=TRUE), id.var=c('ID','var', 'qty', 'price'), na.rm=TRUE)[order(ID)][value==1,][like(variable,"cheeze")&var==1|like(variable,"‌​bottle")&var==2,]

##    ID var qty price    variable value
## 1:  A   1   6     6 cheeze_brie     1
## 2:  B   1   8    24  cheeze_cam     1
## 3:  C   1   1     5 cheeze_brie     1
## 4:  D   1   5     5  cheeze_cam     1
## 5:  E   1   4    12  cheeze_cam     1
## 6:  H   1   1     3  cheeze_cam     1
## 7:  I   1   9    45 cheeze_brie     1
## 8:  J   1   4     4 cheeze_brie     1