我正在努力扩大数据范围。我知道有几页专门用于扩展数据,但我已经尝试了所有他们的建议而且它没有用。
这是我的数据
Officer Company
Robert Abernathy Goldman Sachs
Robert Abernathy Walmart
Robert Abernathy CVS
Rex Adams Goldman Sachs
Rex Adams Dell
Marc Abramowitz Samsung
我希望这些数据看起来像
Officer Company1 Company2 Company3
Robert Abernathy Goldman Sachs Walmart CVS
Rex Adams Goldman Sachs Dell NA
Marc Abramowitz Samsung NA NA
我以为我可以使用tidyr包而且我做了
> library(tidyr)
> ppn_wide<-spread(data=ppn1, key=Officer, value=Company)
Error: Duplicate identifiers for rows (12, 13), (20, 21), (36, 37), (40, 41), (75, 76), (116, 117), (141, 142), (149, 150), (158, 159), (189, 190), (207, 208), (244, 245), (249, 250), (264, 265), (267, 268), (273, 274), (328, 329), (339, 340), (346, 347, 348), (366, 367), (378, 379), (397, 398), (407, 408), (417, 418), (422, 423), (425, 426), (430, 431), (436, 437, 438), (450, 451), (461, 462), (481, 482), (486, 487), (491, 492), (496, 497, 498), (504, 505), (546, 547), (553, 554), (566, 567), (577, 578), (594, 595), (632, 633)'
所以,我也试过这个
> reshape(ppn1, idvar="Officer", timevar="Company", direction="wide")
但是只有专栏官员仍在,公司完全消失。
我也试过使用reshape和reshape2包,但他们不能工作。
> ppn_wide<-cast(ppn1, officer~PPN.org)
Using officer as value column. Use the value argument to cast to override this choice
Error in `[.data.frame`(data, , variables, drop = FALSE) :
选择了未定义的列
> ppn_wide<-dcast(ppn1, officer~PPN.org)
Using officer as value column: use value.var to override.
reshape2包创建了一个名为ppn_wide的数据框,但它看起来与我想要的数据集类似。它使用官员&#39;用于表明他们是否在公司中担任职务的名称。像这样的东西,
officer Goldman Sachs Walmart Dell
Robert Abernathy Robert Abernathy Robert Abernathy NA
这里发生了什么?
答案 0 :(得分:2)
data.table的dcast
方法在此示例中正常工作:
ppn1 = read.table(text='Officer,Company
Robert Abernathy,Goldman Sachs
Robert Abernathy,Walmart
Robert Abernathy,CVS
Rex Adams,Goldman Sachs
Rex Adams,Dell
Marc Abramowitz,Samsung', header=T, sep=',')
感谢@Frank,我们有以下工作:
dcast(ppn1, Officer~rowid(Officer, prefix="Company"))
,并提供:
Officer Company1 Company2 Company3
1 Marc Abramowitz Samsung <NA> <NA>
2 Rex Adams Goldman Sachs Dell <NA>
3 Robert Abernathy Goldman Sachs Walmart CVS
答案 1 :(得分:0)
您可以先将列标题设为新列,然后重新整形:
df <- readr::read_csv('Officer,Company
Robert Abernathy,Goldman Sachs
Robert Abernathy,Walmart
Robert Abernathy,CVS
Rex Adams,Goldman Sachs
Rex Adams,Dell
Marc Abramowitz,Samsung')
df
#> # A tibble: 6 x 2
#> Officer Company
#> <chr> <chr>
#> 1 Robert Abernathy Goldman Sachs
#> 2 Robert Abernathy Walmart
#> 3 Robert Abernathy CVS
#> 4 Rex Adams Goldman Sachs
#> 5 Rex Adams Dell
#> 6 Marc Abramowitz Samsung
# Add column headers as new column (using grouped row number)
library(dplyr)
df %>%
group_by(Officer) %>%
mutate(ColName = paste0('Company', row_number())) %>%
tidyr::spread(ColName, Company)
#> # A tibble: 3 x 4
#> # Groups: Officer [3]
#> Officer Company1 Company2 Company3
#> * <chr> <chr> <chr> <chr>
#> 1 Marc Abramowitz Samsung <NA> <NA>
#> 2 Rex Adams Goldman Sachs Dell <NA>
#> 3 Robert Abernathy Goldman Sachs Walmart CVS