如何将横截面数据转换为R中的事务数据

时间:2017-03-10 18:35:56

标签: r

我有这样的横截面数据:

**Type  Component_ID**
767 801307-00
767 468K29-2
777 263BA101-2
777 964-0453-011
320 6740B050000
320 305-439-401-0
320 1386M56P03
320 2131M81G02
320 2290B020000
319 1588M89G03
319 305-136-803-0
319 9238M66P08
767 801307-00
767 468K29-2
321 M20101-01
320 ACP2788AB04

我想将此转换为交易数据,如下所示:

Type    Component_ID                
767 801307-00   468K29-2            
777 263BA101-2  964-0453-011            
320 6740B050000 305-439-401-0   1386M56P03  2131M81G02  2290B020000
319 1588M89G03  305-136-803-0   9238M66P08      
767 801307-00   468K29-2            
321 M20101-01               
320 ACP2788AB04             

我在reshape2包中尝试了dcast

dcast(data1, Fleet_Type ~ Component_ID)

我的结果是这样的:

  Fleet_Type 020-739-0 020-807-0 071-50001-8102 121664-10 121666-17 1386M56P03 1460M52P03 1498M96G01 1520M27P07
1        310         0         0              0         0         0          0          0          0          0
2        319         0         0              0         0         0          0          1          0          0
3        320         0         0              0         1         2          1          0          1          0
4        321         0         0              0         0         0          0          0          0          1

然而,我不想像这样计算,而是我想要“宽”格式。

我也试过reshape2

> reshape(data1, idvar = "Fleet_Type", timevar = "Component_ID", direction = "wide")
   Fleet_Type
1         767
3         777
5         320
10        319
15        321
50        330
63        310
Warning messages:
1: In reshapeWide(data, idvar = idvar, timevar = timevar, varying = varying,  :
  multiple rows match for Component_ID=801307-00: first taken
2: In reshapeWide(data, idvar = idvar, timevar = timevar, varying = varying,  :
  multiple rows match for Component_ID=468K29-2: first taken
3: In reshapeWide(data, idvar = idvar, timevar = timevar, varying = varying,  :
  multiple rows match for Component_ID=9238M66P08: first taken
4: In reshapeWide(data, idvar = idvar, timevar = timevar, varying = varying,  :

但是我只收到错误消息。

请帮助我

1 个答案:

答案 0 :(得分:0)

df1拆分Type数据并循环显示该数据并折叠Component_ID的值。最后使用df2的值创建一个新的数据框a1

a1 <- lapply( with(df1, split(df1, Type)), function( x ) paste(x$Component_ID, collapse = ', ') )
df2 <- data.frame( Type = as.numeric(names(a1)), Component_ID = unlist(a1))
df2
#     Type                                                                 Component_ID
# 319  319                                        1588M89G03, 305-136-803-0, 9238M66P08
# 320  320 6740B050000, 305-439-401-0, 1386M56P03, 2131M81G02, 2290B020000, ACP2788AB04
# 321  321                                                                    M20101-01
# 767  767                                     801307-00, 468K29-2, 801307-00, 468K29-2
# 777  777                                                     263BA101-2, 964-0453-011

数据:

df1 <- structure(list(Type = c(767L, 767L, 777L, 777L, 320L, 320L, 320L, 320L, 320L, 319L, 319L, 319L, 767L, 767L, 321L, 320L), 
                      Component_ID = c("801307-00", "468K29-2", "263BA101-2", "964-0453-011", "6740B050000", "305-439-401-0", 
                                       "1386M56P03", "2131M81G02", "2290B020000", "1588M89G03", "305-136-803-0", "9238M66P08",
                                       "801307-00", "468K29-2", "M20101-01", "ACP2788AB04")),
                 .Names = c("Type", "Component_ID"), row.names = c(NA, -16L),
                 class = "data.frame")