Question

我有两个这样的大数据集：

df1=data.frame(subject = c(rep(1, 12), rep(2, 10)), day =c(1,1,1,1,1,2,3,15,15,15,15,19,1,1,1,1,2,3,15,15,15,15),stime=c('4/16/2012 6:25','4/16/2012 7:01','4/16/2012 17:22','4/16/2012 17:45','4/16/2012 18:13','4/18/2012 6:50','4/19/2012 6:55','5/1/2012 6:28','5/1/2012 7:00','5/1/2012 16:28','5/1/2012 17:00','5/5/2012 17:00','4/23/2012 5:56','4/23/2012 6:30','4/23/2012 16:55','4/23/2012 17:20','4/25/2012 6:32','4/26/2012 6:28','5/8/2012 5:54','5/8/2012 6:30','5/8/2012 15:55','5/8/2012 16:30'))

df2=data.frame(subject = c(rep(1, 10), rep(2, 10)), day=c(1,1,2,2,3,3,9,9,15,15,1,1,2,2,3,3,9,9,15,15),dtime=c('4/16/2012 6:15','4/16/2012 15:16','4/18/2012 7:15','4/18/2012 21:45','4/19/2012 7:05','4/19/2012 23:17','4/28/2012 7:15','4/28/2012 21:12','5/1/2012 7:15','5/1/2012 15:15','4/23/2012 6:45','4/23/2012 16:45','4/25/2012 6:45','4/25/2012 21:30','4/26/2012 6:45','4/26/2012 22:00','5/2/2012 7:00','5/2/2012 22:00','5/8/2012 6:45','5/8/2012 15:45'))

...

在df2中，＆＃39; dtime＆＃39;每天包含两个时间点。我想在df1中每天使用每个sub的时间点（即＆＃39; stime＆＃39;）在df2中每天减去每个sub的第二个时间点，如果结果为正，则给出dtime中第二个时间点用于该观察，否则给出第一个时间点。例如，对于第1天的主题1（＆＃39; 2012年4月16日6:25＆＃39; - ＆＃39; 4/16/2012 15:16＆＃39;）＆lt; 0，我们给出第一个时间点＆＃39; 2012/4/16 6:15＆＃39;对这个障碍; （＆＃39; 2012年4月16日17:22＆＃39; - ＆＃39; 2012年4月16日15:16＆＃39;）＆gt; 0，所以我们给出第二个时间点＆lt; 4/16/2012 15:16＆＃39;对这个障碍。预期的输出应如下所示：

df3=data.frame(subject = c(rep(1, 12), rep(2, 10)), day =c(1,1,1,1,1,2,3,15,15,15,15,19,1,1,1,1,2,3,15,15,15,15),stime=c('4/16/2012 6:25','4/16/2012 7:01','4/16/2012 17:22','4/16/2012 17:45','4/16/2012 18:13','4/18/2012 6:50','4/19/2012 6:55','5/1/2012 6:28','5/1/2012 7:00','5/1/2012 16:28','5/1/2012 17:00','5/5/2012 17:00','4/23/2012 5:56','4/23/2012 6:30','4/23/2012 16:55','4/23/2012 17:20','4/25/2012 6:32','4/26/2012 6:28','5/8/2012 5:54','5/8/2012 6:30','5/8/2012 15:55','5/8/2012 16:30'), dtime=c('4/16/2012 6:15','4/16/2012 6:15','4/16/2012 15:16','4/16/2012 15:16','4/16/2012 15:16','4/18/2012 7:15','4/19/2012 7:05','5/1/2012 7:15','5/1/2012 7:15','5/1/2012 15:15','5/1/2012 15:15','.','4/23/2012 6:45','4/23/2012 6:45','4/23/2012 16:45','4/23/2012 16:45','4/25/2012 6:45','4/26/2012 6:45','5/8/2012 6:45','5/8/2012 6:45','5/8/2012 15:45','5/8/2012 15:45'))

...

我使用下面的代码来实现这一点，但是，由于缺少了“dtime＆time”。在第19天，R一直给我错误：

df1$dtime <- apply(df1, 1, function(x){  
                  choices <- df2[ df2$subject==as.numeric(x["subject"]) & 
                                       df2$day==as.numeric(x["day"]) , "dtime"]
         if( as.POSIXct(x["stime"], format="%m/%d/%Y %H:%M") < 
                 as.POSIXct(choices[2],format="%m/%d/%Y %H:%M") ) {
            choices[1] 
            }else{ choices[2] } 
                                  } )

Error in if (as.POSIXct(x["stime"], format = "%m/%d/%Y %H:%M") < as.POSIXct(choices[2],  : missing value where TRUE/FALSE needed

有谁知道如何解决这个问题？

Answer 1

首先，我输入了两个数据帧来试试。这是我在伪代码方法方面的想法（将让你完成代码）。 df1，输入时，如下所示：

   subject day           stime
1        1   1  4/16/2012 6:25
2        1   1  4/16/2012 7:01
3        1   1 4/16/2012 17:22
4        1   1 4/16/2012 17:45
5        1   1 4/16/2012 18:13
6        1   2  4/18/2012 6:50
7        1   3  4/19/2012 6:55
8        1  15   5/1/2012 6:28
9        1  15   5/1/2012 7:00
10       1  15  5/1/2012 16:28
11       1  15  5/1/2012 17:00
12       2   1  4/23/2012 5:56
13       2   1  4/23/2012 6:30
14       2   1 4/23/2012 16:55
15       2   1 4/23/2012 17:20
16       2   2  4/25/2012 6:32
17       2   3  4/26/2012 6:28
18       2  15   5/8/2012 5:54
19       2  15   5/8/2012 6:30
20       2  15  5/8/2012 15:55
21       2  15  5/8/2012 16:30

为什么不尝试以下方法：

首先，编写一个简单的循环，使您能够循环遍历df1和df2的stime列中的每个值。如果您愿意，可以将df1和df2数据帧转换为矩阵（使用as.matrix（），这是我的偏好）。
从df1（即4/16/2012 6:25）获取第1行第3列中的第一个值后，取出6:25并将其存储在临时变量中...让＆＃39; s将此变量称为
为df2执行完全相同的操作，您也想要将其与之比较，并将其存储在临时变量中，除了从相关位置抓取变量...让我们调用此变量b
减去两个临时变量（您可能需要编写一些代码来设置这两个部分，以便您可以轻松地进行ab并得到一个数字答案。那就是说，我会把它留给你）。
使用简单的条件if语句
根据条件检查的输出
将此新值添加到具有相应主题和日期的新数据表中。你已经打电话给这个df3。

Answer 2

我得到的答案与你不同。首先，我制作了df1的副本来处理：

df4 <- df1
df4$dtime <- apply(df4, 1, function(x){  
                      choices <- df2[ df2$subject==as.numeric(x["subject"]) & 
                                           df2$day==as.numeric(x["day"]) , "dtime"]
             if( as.POSIXct(x["stime"], format="%m/%d/%Y %H:%M") < 
                     as.POSIXct(choices[1],format="%m/%d/%Y %H:%M") ) {
                choices[1] 
                }else{ choices[2] } 
                                      } )
#----------------------------------------------
   subject day           stime           dtime
1        1   1  4/16/2012 6:25 4/16/2012 15:16
2        1   1  4/16/2012 7:01 4/16/2012 15:16
3        1   1 4/16/2012 17:22 4/16/2012 15:16
4        1   1 4/16/2012 17:45 4/16/2012 15:16
5        1   1 4/16/2012 18:13 4/16/2012 15:16
6        1   2  4/18/2012 6:50  4/18/2012 7:15
7        1   3  4/19/2012 6:55  4/19/2012 7:05
8        1  15   5/1/2012 6:28   5/1/2012 7:15
9        1  15   5/1/2012 7:00   5/1/2012 7:15
10       1  15  5/1/2012 16:28  5/1/2012 15:15
11       1  15  5/1/2012 17:00  5/1/2012 15:15
12       2   1  4/23/2012 5:56  4/23/2012 6:45
13       2   1  4/23/2012 6:30  4/23/2012 6:45
14       2   1 4/23/2012 16:55 4/23/2012 16:45
15       2   1 4/23/2012 17:20 4/23/2012 16:45
16       2   2  4/25/2012 6:32  4/25/2012 6:45
17       2   3  4/26/2012 6:28  4/26/2012 6:45
18       2  15   5/8/2012 5:54   5/8/2012 6:45
19       2  15   5/8/2012 6:30   5/8/2012 6:45
20       2  15  5/8/2012 15:55  5/8/2012 15:45
21       2  15  5/8/2012 16:30  5/8/2012 15:45

根据两列的减法结果创建一个新列

2 个答案: