在R中对具有相同名称的列进行分组

时间:2018-03-30 20:29:54

标签: r dataframe data-manipulation

我正在尝试以“可读”的方式格式化我的数据,其中我有多个具有相同名称的列。我尝试使用melt()函数,但是我没能解决问题,这似乎与变量上有不同的值有关。

数据的一个小例子:

obs     m   ti      td        date        class  code   dis       group  status     grade   freq    date              dis     group   status    grade   freq    date             dis    group   status  grade   freq    date
obs_1   A   grad        05/01/2016 00:00         55060  DDE0300  2016101    A        5.7     97   05/01/2016 15:20  MS0230  2016101      A      8.19    100 05/01/2016 15:20    A0301   2016101  A        5.8   100  27/01/2016 13:12
obs_2   A   grad        05/01/2016 00:00         55070  SSE332         0    D                     03/06/2016 14:08   A0804    0          D                  03/06/2016 14:18    SE089   0        D                   26/08/2016 19:31

现在我想通过观察来分割这个数据框:

    melt(df[1,],id.vars=c("obs","m","ti","td","date","class","code"), 
            measure.vars=c("dis","group","status","grade","freq","date"))

我明白了:

    obs  m   ti td             date class  code variable            value
1 obs_1 A  grad NA 05/01/2016 15:20    NA 55060      dis          DDE0300
2 obs_1 A  grad NA 05/01/2016 15:20    NA 55060    group          2016101
3 obs_1 A  grad NA 05/01/2016 15:20    NA 55060   status               A 
4 obs_1 A  grad NA 05/01/2016 15:20    NA 55060    grade              5.7
5 obs_1 A  grad NA 05/01/2016 15:20    NA 55060     freq               97
6 obs_1 A  grad NA 05/01/2016 15:20    NA 55060     date 05/01/2016 15:20
Warning message:
attributes are not identical across measure variables; they will be dropped 

现在,我'缺少'两列,分别是MS0230和A0301以及它们的组,状态等等。我该如何解决这个问题?

请记住,它不必使用melt()函数。

重现数据的代码:

df<-structure(list(obs = structure(1:2, .Label = c("obs_1", "obs_2"
), class = "factor"), m = structure(c(1L, 1L), .Label = "A ", class = "factor"), 
    ti = structure(c(1L, 1L), .Label = "grad", class = "factor"), 
    td = c(NA, NA), datei = structure(c(1L, 1L), .Label = "05/01/2016 00:00", class = "factor"), 
    class = c(NA, NA), code = c(55060L, 55070L), dis = structure(1:2, .Label = c("DDE0300", 
    "SSE332"), class = "factor"), group = c(2016101L, 0L), status = structure(1:2, .Label = c("A ", 
    "D "), class = "factor"), grade = c(5.7, NA), freq = c(97L, 
    NA), date = structure(c(2L, 1L), .Label = c("03/06/2016 14:08", 
    "05/01/2016 15:20"), class = "factor"), dis = structure(c(2L, 
    1L), .Label = c("A0804", "MS0230"), class = "factor"), group = c(2016101L, 
    0L), status = structure(1:2, .Label = c("A ", "D "), class = "factor"), 
    grade = c(8.19, NA), freq = c(100L, NA), date = structure(c(2L, 
    1L), .Label = c("03/06/2016 14:18", "05/01/2016 15:20"), class = "factor"), 
    dis = structure(1:2, .Label = c("A0301", "SE089"), class = "factor"), 
    group = c(2016101L, 0L), status = structure(1:2, .Label = c("A ", 
    "D "), class = "factor"), grade = c(5.8, NA), freq = c(100L, 
    NA), date = structure(c(2L, 1L), .Label = c("26/08/2016 19:31", 
    "27/01/2016 13:12"), class = "factor")), .Names = c("obs", 
"m", "ti", "td", "datei", "class", "code", "dis", "group", "status", 
"grade", "freq", "date", "dis", "group", "status", "grade", "freq", 
"date", "dis", "group", "status", "grade", "freq", "date"), class = "data.frame", row.names = c(NA, 
-2L))

1 个答案:

答案 0 :(得分:0)

感谢Henrik的链接,我设法弄明白了。不确定这是不是最好的解决方案。

但这就是我的所作所为:

melt(setDT(df[1,]), id=1L, id.vars=c("obs","m","ti","td","date","class","code"),
      measure=patterns("dis","group","status","grade","freq","date"),
      value.name=c("Dis","Group","Status","Grade","Freq","Date"))

哪位给了我:

     obs  m   ti td             date class  code variable     Dis   Group Status Grade Freq             Date
1: obs_1 A  grad NA 05/01/2016 15:20    NA 55060        1 DDE0300 2016101     A   5.70   97 05/01/2016 00:00
2: obs_1 A  grad NA 05/01/2016 15:20    NA 55060        2  MS0230 2016101     A   8.19  100 05/01/2016 15:20
3: obs_1 A  grad NA 05/01/2016 15:20    NA 55060        3   A0301 2016101     A   5.80  100 05/01/2016 15:20
4: obs_1 A  grad NA 05/01/2016 15:20    NA 55060        4      NA      NA     NA    NA   NA 27/01/2016 13:12