R - 如何重新整形数据帧整理两列的值?

时间:2017-10-29 01:29:48

标签: r dataframe reshape

我有一个数据框,我需要重新整形以便于在viz应用程序中使用它。这是数据帧的精简版本:

Carrier <- c("Mesa", "United", "JetBlue", "ExpressJet", "SkyWest")
Flight_Num <- c(7124, 7177, 334, 1223, 6380)
Origin <- c("ORD", "EWR", "SFO", "BOS", "BDL")
Dest <- c("PIT", "BOI", "DSM", "CWA", "CMH")
Sched_Depr <- c(1955, 1900, 1845, 1253, 1755)

df <- data.frame(Carrier, Flight_Num, Origin, Dest, Sched_Depr)

     Carrier Flight_Num Origin Dest Sched_Depr
1       Mesa       7124    ORD  PIT       1955
2     United       7177    EWR  BOI       1900
3    JetBlue        334    SFO  DSM       1845
4 ExpressJet       1223    BOS  CWA       1253
5    SkyWest       6380    BDL  CMH       1755

OriginDept被viz应用程序解释为地理数据(即坐标)。我需要在名为Coords的单个列中整理它们。同时我需要创建一个新的订单标记变量Order_Points。因此,新的重塑数据框将如下所示:

      Carrier Flight_Num Coords Sched_Depr Order_Points
1        Mesa       7124    ORD       1955            1
2        Mesa       7124    PIT       1955            2
3      United       7177    EWR       1900            1
4      United       7177    BOI       1900            2
5     JetBlue        334    SFO       1845            1
6     JetBlue        334    DSM       1845            2
7  ExpressJet       1223    BOS       1253            1
8  ExpressJet       1223    CWA       1253            2
9     SkyWest       6380    BDL       1755            1
10    SkyWest       6380    CMH       1755            2

在保留(和复制)其他变量的同时整理两列这样的有效方法是什么?

2 个答案:

答案 0 :(得分:2)

这是使用tidyverse函数的选项。我们使用gather将数据框从“宽”格式转换为“长格式”。这还会添加一个列(此处称为Type),用于标记CoordsOrigin还是Dest

library(tidyverse)

df.long = df %>% 
  gather(Type, Coords, Origin, Dest) %>% 
  arrange(Carrier, desc(Type))
      Carrier Flight_Num Sched_Depr   Type Coords
1  ExpressJet       1223       1253 Origin    BOS
2  ExpressJet       1223       1253   Dest    CWA
3     JetBlue        334       1845 Origin    SFO
4     JetBlue        334       1845   Dest    DSM
5        Mesa       7124       1955 Origin    ORD
6        Mesa       7124       1955   Dest    PIT
7     SkyWest       6380       1755 Origin    BDL
8     SkyWest       6380       1755   Dest    CMH
9      United       7177       1900 Origin    EWR
10     United       7177       1900   Dest    BOI

答案 1 :(得分:0)

您也可以使用基数R:

 dat <- data.frame(Carrier, Flight_Num, Origin, Dest, Sched_Depr)
  df=reshape(dat,idvar = "Carrier",varying = list(3:4),direction = "long")
 `row.names<-`(df[order(df[,1]),],NULL)
       Carrier Flight_Num Sched_Depr time Origin
 1  ExpressJet       1223       1253    1    BOS
 2  ExpressJet       1223       1253    2    CWA
 3     JetBlue        334       1845    1    SFO
 4     JetBlue        334       1845    2    DSM
 5        Mesa       7124       1955    1    ORD
 6        Mesa       7124       1955    2    PIT
 7     SkyWest       6380       1755    1    BDL
 8     SkyWest       6380       1755    2    CMH
 9      United       7177       1900    1    EWR
 10     United       7177       1900    2    BOI

您可以将变量的时间名称更改为您上面示例中的时间