我有一个数据框df
,如下所示。
Id ProcessDate
10 2011-12-29 14:14:00
11 2011-12-29 14:16:00
12 2011-12-29 14:14:00
13 2011-12-29 14:20:00
14 2011-12-29 14:49:00
15 2011-12-29 14:51:00
16 2011-12-29 14:53:00
17 2011-12-29 15:11:00
18 2011-12-29 15:13:00
19 2011-12-29 15:10:00
20 2011-12-29 15:21:00
21 2011-12-29 14:34:00
22 2011-12-29 15:26:00
我正在尝试根据此条件创建第三列Status
,其中包含这三个值中的一个{Before, during , after }
。
if (df$ProcessDate < 2011-12-29 14:48:00)
then df$Status = "Before"
else if (df$ProcessDate > 2011-12-29 14:48:00 & df$ProcessDate < 2011-12-29 15:16:00)
then df$Status = "Between"
else df$Status = "After"
最终的数据框应如下所示。
Id ProcessDate Status
10 2011-12-29 14:14:00 Before
11 2011-12-29 14:16:00 Before
12 2011-12-29 14:14:00 Before
13 2011-12-29 14:20:00 Before
14 2011-12-29 14:49:00 Between
15 2011-12-29 14:51:00 Between
16 2011-12-29 14:53:00 Between
17 2011-12-29 15:11:00 Between
18 2011-12-29 15:13:00 Between
19 2011-12-29 15:10:00 Between
20 2011-12-29 15:21:00 After
21 2011-12-29 14:34:00 After
22 2011-12-29 15:26:00 After
我尝试了一些事情而且没有用,对此问题的任何帮助都非常感谢。
答案 0 :(得分:6)
这可能是一种可能的解决方案
Id ProcessDate Status
1 10 2011-12-29 14:14:00 Before
2 11 2011-12-29 14:16:00 Before
3 12 2011-12-29 14:14:00 Before
4 13 2011-12-29 14:20:00 Before
5 14 2011-12-29 14:49:00 Between
6 15 2011-12-29 14:51:00 Between
7 16 2011-12-29 14:53:00 Between
8 17 2011-12-29 15:11:00 Between
9 18 2011-12-29 15:13:00 Between
10 19 2011-12-29 15:10:00 Between
11 20 2011-12-29 15:21:00 After
12 21 2011-12-29 14:34:00 Before
13 22 2011-12-29 15:26:00 After
导致
NSDateFormatter
答案 1 :(得分:4)
对于这种特殊情况,在基本R中执行此操作的一种非常简单的方法是将所有内容设置为'Between'
,然后使用子集赋值来更改应该是其他内容的行:
df$ProcessDate <- as.POSIXct(df$ProcessDate) # skip if already parsed to datetime
df$Status <- 'Between'
df$Status[df$ProcessDate < as.POSIXct('2011-12-29 14:48:00')] <- 'Before'
df$Status[df$ProcessDate >= as.POSIXct('2011-12-29 15:16:00')] <- 'After'
df
## Id ProcessDate Status
## 1 10 2011-12-29 14:14:00 Before
## 2 11 2011-12-29 14:16:00 Before
## 3 12 2011-12-29 14:14:00 Before
## 4 13 2011-12-29 14:20:00 Before
## 5 14 2011-12-29 14:49:00 Between
## 6 15 2011-12-29 14:51:00 Between
## 7 16 2011-12-29 14:53:00 Between
## 8 17 2011-12-29 15:11:00 Between
## 9 18 2011-12-29 15:13:00 Between
## 10 19 2011-12-29 15:10:00 Between
## 11 20 2011-12-29 15:21:00 After
## 12 21 2011-12-29 14:34:00 Before
## 13 22 2011-12-29 15:26:00 After
cut
这样做的目的是使用cut
,它有cut.POSIXt
方法。除了您想要的数据之外,它还需要在数据之前和之后使用断点,但这将为分类数据提供一个很好的因素。
df$Status <- cut(df$ProcessDate,
breaks = c(min(df$ProcessDate),
as.POSIXct(c('2011-12-29 14:48:00', '2011-12-29 15:16:00')),
max(df$ProcessDate) + 1),
labels = c('Before', 'Between', 'After'))
ifelse
来电最常见和最通用的基本版本是嵌套的ifelse
调用,它们看起来很丑陋(特别是如果有很多),但是要快速评估,因为ifelse
是向量化的,而{{1不是:
if
df$Status <- ifelse(df$ProcessDate < as.POSIXct('2011-12-29 14:48:00'),
'Before',
ifelse(df$ProcessDate < as.POSIXct('2011-12-29 15:16:00'),
'Between',
'After'))
是嵌套dplyr::case_when
调用的不错替代品。它连续评估每个条件并返回相应的值:
ifelse
除了library(dplyr)
df %>% mutate(
ProcessDate = as.POSIXct(ProcessDate), # skip this line if already datetime
# if this is true, then return "Before"
Status = case_when(.$ProcessDate < as.POSIXct('2011-12-29 14:48:00') ~ 'Before',
# for the rest, if this is true, return "Between"
.$ProcessDate < as.POSIXct('2011-12-29 15:16:00') ~ 'Between',
# always true, so make the rest "After"
TRUE ~ 'After'))
之外,所有版本都返回相同的内容,后者返回一个因子而不是字符向量。
答案 2 :(得分:4)
试试这个:
left <- as.POSIXct("12/29/2011 14:48", format = "%m/%d/%Y %H:%M")
right <- as.POSIXct("12/29/2011 15:16", format = "%m/%d/%Y %H:%M")
DT[, Status := ifelse(ProcessDate < left, "before",
ifelse(ProcessDate > right, "after", "between"))]
它给出了:
Id ProcessDate Status
1: 10 2011-12-29 14:14:00 before
2: 11 2011-12-29 14:16:00 before
3: 12 2011-12-29 14:14:00 before
4: 13 2011-12-29 14:20:00 before
5: 14 2011-12-29 14:49:00 between
6: 15 2011-12-29 14:51:00 between
7: 16 2011-12-29 14:53:00 between
8: 17 2011-12-29 15:11:00 between
9: 18 2011-12-29 15:13:00 between
10: 19 2011-12-29 15:10:00 between
11: 20 2011-12-29 15:21:00 after
12: 21 2011-12-29 15:34:00 after
13: 22 2011-12-29 15:26:00 after
与上述相同的结果,可矢量化ifelse()
和data.table
答案 3 :(得分:0)
可能的解决方案之一是将您的时间转换为纪元值然后进行比较。 这可以通过使用as.integer(as.POSIXct(“Time”))来完成,如下所示
df = NULL
df$ids = c(10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22)
df$date = c('2011-12-29 14:14:00', '2011-12-29 14:16:00', '2011-12-29 14:14:00', '2011-12-29 14:20:00', '2011-12-29 14:49:00', '2011-12-29 14:51:00', '2011-12-29 14:53:00', '2011-12-29 15:11:00', '2011-12-29 15:13:00', '2011-12-29 15:10:00', '2011-12-29 15:21:00', '2011-12-29 14:34:00', '2011-12-29 15:26:00')
df = as.data.frame(df)
df$date = as.integer(as.POSIXct(df$date))
upper = as.integer(as.POSIXct('2011-12-29 15:16:00'))
lower = as.integer(as.POSIXct('2011-12-29 14:48:00'))
您将转换日期列如下
> df
ids date
1 10 1325148240
2 11 1325148360
3 12 1325148240
4 13 1325148600
5 14 1325150340
6 15 1325150460
7 16 1325150580
8 17 1325151660
9 18 1325151780
10 19 1325151600
11 20 1325152260
12 21 1325149440
13 22 1325152560
然后您可以简单地执行数字比较
for(i in c(1:nrow(df))){
if(df$date[i] < lower)
df$Status[i] = "Before"
else if(df$date[i] > lower & df$date[i] < upper)
df$Status[i] = "Between"
else
df$Status[i] = "After"
}
导致输出
> df
ids date Status
1 10 1325148240 Before
2 11 1325148360 Before
3 12 1325148240 Before
4 13 1325148600 Before
5 14 1325150340 Between
6 15 1325150460 Between
7 16 1325150580 Between
8 17 1325151660 Between
9 18 1325151780 Between
10 19 1325151600 Between
11 20 1325152260 After
12 21 1325149440 Before
13 22 1325152560 After