我有一个如下数据框:
read.csv(text="num,placed,recovered
1,2013-02-22 12:14:00,2013-02-27 15:14:00
1,2013-03-03 17:32:00,2013-03-07 17:32:00
1,2013-04-24 10:13:00,2013-04-26 07:47:00
1,2013-04-15 14:51:00,2013-04-19 09:36:00
1,2013-04-11 11:56:00,2013-04-15 12:52:00
10,2013-02-22 07:30:00,2013-02-27 14:55:00
10,2013-03-03 17:20:00,2013-03-07 17:20:00
10,2013-04-15 15:22:00,2013-04-19 09:48:00
10,2013-02-17 10:38:00,2013-02-22 07:18:00
10,2013-04-11 10:09:00,2013-04-15 13:21:00
10,2013-04-24 10:07:00,2013-04-26 08:23:00
11,2013-02-22 14:23:00,2013-02-27 15:50:00
11,2013-04-11 12:51:00,2013-04-14 09:40:00
11,2013-04-15 14:45:00,2013-04-19 08:28:00
11,2013-04-19 10:13:00,2013-04-23 12:01:00
14,2013-03-01 13:45:00,2013-03-08 14:28:00
14,2013-02-22 13:22:00,2013-02-27 15:24:00
14,2013-04-04 15:36:00,2013-04-17 15:04:00",header=TRUE)
我想重新安排它,以便num
中的每个肠道出现一次,其中所有placed
和recovered
值都在一行中。以下是一个示例行:
num placed1 recovered1 placed2 recovered2 placed3 recovered3 placed4 recovered4 placed5 recovered5
1 2013-02-22 12:14:00 2013-02-27 15:14:00 2013-03-03 17:32:00 2013-03-07 17:32:00 2013-04-24 10:13:00 2013-04-26 07:47:00 2013-04-15 14:51:00 2013-04-19 09:36:00 2013-04-11 11:56:00 2013-04-15 12:52:00
某些行将具有不同数量的已放置和已恢复的值。 NA
出现在这些地方是很好的。我尝试过使用reshape函数,但似乎无法得到我想要的东西。
我这样做是为了对我正在清理的数据集进行子集化。另一个数据集随时间记录测量值以及收集时间。获取数据的设备存储在num
列中。我想获取该数据帧的子集,仅获取该设备所处的时间间隔(每对placed
和recovered
数据之间的时间)。因此,其他数据框看起来如下所示:
num temp time
1 5 2013-02-22 12:13:50
1 6 2013-02-22 12:14:00
1 4 2013-02-22 12:14:10
1 9 2013-04-24 09:45:20
1 7 2013-04-24 11:45:50
10 23 2013-03-03 19:23:40
如果我能够成功对其进行子集化,结果将类似于以下
num temp time
1 6 2013-02-22 12:14:00
1 4 2013-02-22 12:14:10
1 7 2013-04-24 11:45:50
10 23 2013-03-03 19:23:40
答案 0 :(得分:2)
您只需要在数据集中包含“时间”变量,reshape
即可正常工作:
mydf$time <- with(mydf, ave(num, num, FUN = seq_along))
head(mydf)
# num placed recovered time
# 1 1 2013-02-22 12:14:00 2013-02-27 15:14:00 1
# 2 1 2013-03-03 17:32:00 2013-03-07 17:32:00 2
# 3 1 2013-04-24 10:13:00 2013-04-26 07:47:00 3
# 4 1 2013-04-15 14:51:00 2013-04-19 09:36:00 4
# 5 1 2013-04-11 11:56:00 2013-04-15 12:52:00 5
# 6 10 2013-02-22 07:30:00 2013-02-27 14:55:00 1
reshape(mydf, idvar="num", timevar="time", direction = "wide")
# num placed.1 recovered.1 placed.2 recovered.2
# 1 1 2013-02-22 12:14:00 2013-02-27 15:14:00 2013-03-03 17:32:00 2013-03-07 17:32:00
# 6 10 2013-02-22 07:30:00 2013-02-27 14:55:00 2013-03-03 17:20:00 2013-03-07 17:20:00
# 12 11 2013-02-22 14:23:00 2013-02-27 15:50:00 2013-04-11 12:51:00 2013-04-14 09:40:00
# 16 14 2013-03-01 13:45:00 2013-03-08 14:28:00 2013-02-22 13:22:00 2013-02-27 15:24:00
# placed.3 recovered.3 placed.4 recovered.4
# 1 2013-04-24 10:13:00 2013-04-26 07:47:00 2013-04-15 14:51:00 2013-04-19 09:36:00
# 6 2013-04-15 15:22:00 2013-04-19 09:48:00 2013-02-17 10:38:00 2013-02-22 07:18:00
# 12 2013-04-15 14:45:00 2013-04-19 08:28:00 2013-04-19 10:13:00 2013-04-23 12:01:00
# 16 2013-04-04 15:36:00 2013-04-17 15:04:00 <NA> <NA>
# placed.5 recovered.5 placed.6 recovered.6
# 1 2013-04-11 11:56:00 2013-04-15 12:52:00 <NA> <NA>
# 6 2013-04-11 10:09:00 2013-04-15 13:21:00 2013-04-24 10:07:00 2013-04-26 08:23:00
# 12 <NA> <NA> <NA> <NA>
# 16 <NA> <NA> <NA> <NA>
如果你像我上面那样添加了“time”变量,你也可以在制作更长的数据集之后使用“reshape2”包。这个超长的数据集(我在下面称之为“mydf.l”)可能对于子集化比对宽数据集更有用:
library(reshape2)
mydf.l <- melt(mydf, id.vars=c("num", "time"))
head(mydf.l)
# num time variable value
# 1 1 1 placed 2013-02-22 12:14:00
# 2 1 2 placed 2013-03-03 17:32:00
# 3 1 3 placed 2013-04-24 10:13:00
# 4 1 4 placed 2013-04-15 14:51:00
# 5 1 5 placed 2013-04-11 11:56:00
# 6 10 1 placed 2013-02-22 07:30:00
dcast(mydf.l, num ~ variable + time)
# num placed_1 placed_2 placed_3 placed_4
# 1 1 2013-02-22 12:14:00 2013-03-03 17:32:00 2013-04-24 10:13:00 2013-04-15 14:51:00
# 2 10 2013-02-22 07:30:00 2013-03-03 17:20:00 2013-04-15 15:22:00 2013-02-17 10:38:00
# 3 11 2013-02-22 14:23:00 2013-04-11 12:51:00 2013-04-15 14:45:00 2013-04-19 10:13:00
# 4 14 2013-03-01 13:45:00 2013-02-22 13:22:00 2013-04-04 15:36:00 <NA>
# placed_5 placed_6 recovered_1 recovered_2
# 1 2013-04-11 11:56:00 <NA> 2013-02-27 15:14:00 2013-03-07 17:32:00
# 2 2013-04-11 10:09:00 2013-04-24 10:07:00 2013-02-27 14:55:00 2013-03-07 17:20:00
# 3 <NA> <NA> 2013-02-27 15:50:00 2013-04-14 09:40:00
# 4 <NA> <NA> 2013-03-08 14:28:00 2013-02-27 15:24:00
# recovered_3 recovered_4 recovered_5 recovered_6
# 1 2013-04-26 07:47:00 2013-04-19 09:36:00 2013-04-15 12:52:00 <NA>
# 2 2013-04-19 09:48:00 2013-02-22 07:18:00 2013-04-15 13:21:00 2013-04-26 08:23:00
# 3 2013-04-19 08:28:00 2013-04-23 12:01:00 <NA> <NA>
# 4 2013-04-17 15:04:00 <NA> <NA> <NA>