我正在尝试以“可读”的方式格式化我的数据,其中我有多个具有相同名称的列。我尝试使用melt()函数,但是我没能解决问题,这似乎与变量上有不同的值有关。
数据的一个小例子:
obs m ti td date class code dis group status grade freq date dis group status grade freq date dis group status grade freq date
obs_1 A grad 05/01/2016 00:00 55060 DDE0300 2016101 A 5.7 97 05/01/2016 15:20 MS0230 2016101 A 8.19 100 05/01/2016 15:20 A0301 2016101 A 5.8 100 27/01/2016 13:12
obs_2 A grad 05/01/2016 00:00 55070 SSE332 0 D 03/06/2016 14:08 A0804 0 D 03/06/2016 14:18 SE089 0 D 26/08/2016 19:31
现在我想通过观察来分割这个数据框:
melt(df[1,],id.vars=c("obs","m","ti","td","date","class","code"),
measure.vars=c("dis","group","status","grade","freq","date"))
我明白了:
obs m ti td date class code variable value
1 obs_1 A grad NA 05/01/2016 15:20 NA 55060 dis DDE0300
2 obs_1 A grad NA 05/01/2016 15:20 NA 55060 group 2016101
3 obs_1 A grad NA 05/01/2016 15:20 NA 55060 status A
4 obs_1 A grad NA 05/01/2016 15:20 NA 55060 grade 5.7
5 obs_1 A grad NA 05/01/2016 15:20 NA 55060 freq 97
6 obs_1 A grad NA 05/01/2016 15:20 NA 55060 date 05/01/2016 15:20
Warning message:
attributes are not identical across measure variables; they will be dropped
现在,我'缺少'两列,分别是MS0230和A0301以及它们的组,状态等等。我该如何解决这个问题?
请记住,它不必使用melt()函数。
重现数据的代码:
df<-structure(list(obs = structure(1:2, .Label = c("obs_1", "obs_2"
), class = "factor"), m = structure(c(1L, 1L), .Label = "A ", class = "factor"),
ti = structure(c(1L, 1L), .Label = "grad", class = "factor"),
td = c(NA, NA), datei = structure(c(1L, 1L), .Label = "05/01/2016 00:00", class = "factor"),
class = c(NA, NA), code = c(55060L, 55070L), dis = structure(1:2, .Label = c("DDE0300",
"SSE332"), class = "factor"), group = c(2016101L, 0L), status = structure(1:2, .Label = c("A ",
"D "), class = "factor"), grade = c(5.7, NA), freq = c(97L,
NA), date = structure(c(2L, 1L), .Label = c("03/06/2016 14:08",
"05/01/2016 15:20"), class = "factor"), dis = structure(c(2L,
1L), .Label = c("A0804", "MS0230"), class = "factor"), group = c(2016101L,
0L), status = structure(1:2, .Label = c("A ", "D "), class = "factor"),
grade = c(8.19, NA), freq = c(100L, NA), date = structure(c(2L,
1L), .Label = c("03/06/2016 14:18", "05/01/2016 15:20"), class = "factor"),
dis = structure(1:2, .Label = c("A0301", "SE089"), class = "factor"),
group = c(2016101L, 0L), status = structure(1:2, .Label = c("A ",
"D "), class = "factor"), grade = c(5.8, NA), freq = c(100L,
NA), date = structure(c(2L, 1L), .Label = c("26/08/2016 19:31",
"27/01/2016 13:12"), class = "factor")), .Names = c("obs",
"m", "ti", "td", "datei", "class", "code", "dis", "group", "status",
"grade", "freq", "date", "dis", "group", "status", "grade", "freq",
"date", "dis", "group", "status", "grade", "freq", "date"), class = "data.frame", row.names = c(NA,
-2L))
答案 0 :(得分:0)
感谢Henrik的链接,我设法弄明白了。不确定这是不是最好的解决方案。
但这就是我的所作所为:
melt(setDT(df[1,]), id=1L, id.vars=c("obs","m","ti","td","date","class","code"),
measure=patterns("dis","group","status","grade","freq","date"),
value.name=c("Dis","Group","Status","Grade","Freq","Date"))
哪位给了我:
obs m ti td date class code variable Dis Group Status Grade Freq Date
1: obs_1 A grad NA 05/01/2016 15:20 NA 55060 1 DDE0300 2016101 A 5.70 97 05/01/2016 00:00
2: obs_1 A grad NA 05/01/2016 15:20 NA 55060 2 MS0230 2016101 A 8.19 100 05/01/2016 15:20
3: obs_1 A grad NA 05/01/2016 15:20 NA 55060 3 A0301 2016101 A 5.80 100 05/01/2016 15:20
4: obs_1 A grad NA 05/01/2016 15:20 NA 55060 4 NA NA NA NA NA 27/01/2016 13:12