使数据广泛

时间:2017-12-12 19:08:42

标签: r

我正在努力扩大数据范围。我知道有几页专门用于扩展数据,但我已经尝试了所有他们的建议而且它没有用。

这是我的数据

Officer              Company

Robert Abernathy     Goldman Sachs

Robert Abernathy     Walmart

Robert Abernathy     CVS

Rex Adams            Goldman Sachs

Rex Adams            Dell

Marc Abramowitz      Samsung

我希望这些数据看起来像

Officer             Company1       Company2    Company3

Robert Abernathy    Goldman Sachs  Walmart     CVS

Rex Adams           Goldman Sachs  Dell        NA

Marc Abramowitz     Samsung        NA          NA

我以为我可以使用tidyr包而且我做了

> library(tidyr)

> ppn_wide<-spread(data=ppn1, key=Officer, value=Company)
Error: Duplicate identifiers for rows (12, 13), (20, 21), (36, 37), (40, 41), (75, 76), (116, 117), (141, 142), (149, 150), (158, 159), (189, 190), (207, 208), (244, 245), (249, 250), (264, 265), (267, 268), (273, 274), (328, 329), (339, 340), (346, 347, 348), (366, 367), (378, 379), (397, 398), (407, 408), (417, 418), (422, 423), (425, 426), (430, 431), (436, 437, 438), (450, 451), (461, 462), (481, 482), (486, 487), (491, 492), (496, 497, 498), (504, 505), (546, 547), (553, 554), (566, 567), (577, 578), (594, 595), (632, 633)'

所以,我也试过这个

> reshape(ppn1, idvar="Officer", timevar="Company", direction="wide")

但是只有专栏官员仍在,公司完全消失。

我也试过使用reshape和reshape2包,但他们不能工作。

> ppn_wide<-cast(ppn1, officer~PPN.org)
Using officer as value column.  Use the value argument to cast to override this choice
Error in `[.data.frame`(data, , variables, drop = FALSE) : 

选择了未定义的列

> ppn_wide<-dcast(ppn1, officer~PPN.org)
Using officer as value column: use value.var to override.

reshape2包创建了一个名为ppn_wide的数据框,但它看起来与我想要的数据集类似。它使用官员&#39;用于表明他们是否在公司中担任职务的名称。像这样的东西,

officer            Goldman Sachs      Walmart            Dell
Robert Abernathy   Robert Abernathy   Robert Abernathy   NA

这里发生了什么?

2 个答案:

答案 0 :(得分:2)

data.table的dcast方法在此示例中正常工作:

ppn1 = read.table(text='Officer,Company
Robert Abernathy,Goldman Sachs
Robert Abernathy,Walmart
Robert Abernathy,CVS
Rex Adams,Goldman Sachs
Rex Adams,Dell
Marc Abramowitz,Samsung', header=T, sep=',')

感谢@Frank,我们有以下工作:

dcast(ppn1, Officer~rowid(Officer, prefix="Company"))

,并提供:

           Officer      Company1 Company2 Company3
1  Marc Abramowitz       Samsung     <NA>     <NA>
2        Rex Adams Goldman Sachs     Dell     <NA>
3 Robert Abernathy Goldman Sachs  Walmart      CVS

答案 1 :(得分:0)

您可以先将列标题设为新列,然后重新整形:

df <- readr::read_csv('Officer,Company
Robert Abernathy,Goldman Sachs
Robert Abernathy,Walmart
Robert Abernathy,CVS
Rex Adams,Goldman Sachs
Rex Adams,Dell
Marc Abramowitz,Samsung')
df
#> # A tibble: 6 x 2
#>            Officer       Company
#>              <chr>         <chr>
#> 1 Robert Abernathy Goldman Sachs
#> 2 Robert Abernathy       Walmart
#> 3 Robert Abernathy           CVS
#> 4        Rex Adams Goldman Sachs
#> 5        Rex Adams          Dell
#> 6  Marc Abramowitz       Samsung

# Add column headers as new column (using grouped row number)
library(dplyr)
df %>% 
  group_by(Officer) %>% 
  mutate(ColName = paste0('Company', row_number())) %>% 
  tidyr::spread(ColName, Company)
#> # A tibble: 3 x 4
#> # Groups:   Officer [3]
#>            Officer      Company1 Company2 Company3
#> *            <chr>         <chr>    <chr>    <chr>
#> 1  Marc Abramowitz       Samsung     <NA>     <NA>
#> 2        Rex Adams Goldman Sachs     Dell     <NA>
#> 3 Robert Abernathy Goldman Sachs  Walmart      CVS