所以我有一个如下所示的数据框:
ID Initialdate Finaldate
1405 2003-12-03 2010-12-07
7044 2004-12-08 2011-10-13
7219 2008-05-16 2009-06-04
18618 2004-06-17 2012-02-13
19900 2005-06-01 2008-06-11
20138 2010-01-20 2010-01-20
29067 2003-04-30 2004-09-10
33546 2003-11-25 2008-10-10
37321 2003-06-07 2006-03-20
43028 2004-09-23 2008-07-25
43591 2005-04-06 2005-11-15
46749 2005-02-28 2005-05-16
48846 2005-08-02 2005-08-02
114353 2002-05-17 2006-10-26
128180 2004-06-17 2010-06-21
128648 2003-05-07 2009-07-23
133337 2004-05-26 2012-07-26
149181 2002-10-19 2008-07-27
214079 2003-09-26 2007-05-20
215060 2006-04-17 2011-08-17
229816 2007-04-25 2011-09-24
238123 2007-11-26 2012-01-31
253776 2006-03-02 2012-04-19
258660 2010-03-25 2012-04-09
265356 2002-04-22 2002-04-22
我使用以下代码制作了第四列,其中包含最终日期和初始日期之间的差异,并按原样清除:
df$Duration<-(difftime(df$Finaldate, df$Initialdate, units = "days"))
df$Duration<-as.numeric(df$Duration, units = "days")
我得到以下输出,这让我很开心:
ID Initialdate Finaldate Duration
1405 2003-12-03 2010-12-07 2561.00000
7044 2004-12-08 2011-10-13 2499.95833
7219 2008-05-16 2009-06-04 384.00000
18618 2004-06-17 2012-02-13 2797.04167
19900 2005-06-01 2008-06-11 1106.00000
20138 2010-01-20 2010-01-20 0.00000
29067 2003-04-30 2004-09-10 499.00000
33546 2003-11-25 2008-10-10 1780.95833
37321 2003-06-07 2006-03-20 1017.04167
43028 2004-09-23 2008-07-25 1401.00000
43591 2005-04-06 2005-11-15 223.04167
46749 2005-02-28 2005-05-16 76.95833
48846 2005-08-02 2005-08-02 0.00000
114353 2002-05-17 2006-10-26 1623.00000
128180 2004-06-17 2010-06-21 2195.00000
128648 2003-05-07 2009-07-23 2269.00000
133337 2004-05-26 2012-07-26 2983.00000
149181 2002-10-19 2008-07-27 2108.00000
214079 2003-09-26 2007-05-20 1332.00000
215060 2006-04-17 2011-08-17 1948.00000
229816 2007-04-25 2011-09-24 1613.00000
238123 2007-11-26 2012-01-31 1527.00000
253776 2006-03-02 2012-04-19 2239.95833
258660 2010-03-25 2012-04-09 746.00000
265356 2002-04-22 2002-04-22 0.00000
我的计划是对持续时间数据进行矢量化,特别是那些不到180天的数据,然后使用新数据框从初始数据框中删除这些ID#,使用如下代码:df_final<-df[!(df$ID %in% unqualified$ID),]
。但是,当我这样做时:
unqualified<-(df[df$Duration <= '179.000',])
我得到这个输出,这绝对不正确:
ID Initialdate Finaldate Duration
19900 2005-06-01 2008-06-11 1106.000
20138 2010-01-20 2010-01-20 0.000
33546 2003-11-25 2008-10-10 1780.958
37321 2003-06-07 2006-03-20 1017.042
43028 2004-09-23 2008-07-25 1401.000
48846 2005-08-02 2005-08-02 0.000
114353 2002-05-17 2006-10-26 1623.000
214079 2003-09-26 2007-05-20 1332.000
229816 2007-04-25 2011-09-24 1613.000
238123 2007-11-26 2012-01-31 1527.000
265356 2002-04-22 2002-04-22 0.000
我想也许是因为持续时间中的数字存在问题,但是当我运行sapply(unqualified, class)
和sapply(unqualified, mode)
时,它们会被列为数字。我还应该提一下,在我的编码中,我确实使用strptime转换日期以确保它们是正确的。我已经四处搜寻,试图找出问题但是一切都在酝酿中......任何帮助都会受到赞赏
答案 0 :(得分:1)
这样怎么样:
unqualified<-(df[df$Duration < 180,])
即。你的号码是一个数字,而不是一个字符串。