Question

我试图在R：

中模拟以下Excel公式（在单元格L2中）

=IF(OR(K2=K1,K2=K3),"D","R")

在一系列重复或重复的数据行中，区别特征是开始日期。如果两个或多个记录具有相同的开始日期，则它们是重复的，如果它们具有不同的开始日期，则它们是重复的。上面的公式将D表示重复，R表示重复。例如：

Sample  Date-Time   Type
4753432 24/01/13 10:20  D
4753432 24/01/13 10:20  D
4753441 24/01/13 11:23  R
4753441 25/01/13 10:44  D
4753441 25/01/13 10:44  D
4753504 25/01/13 16:46  D
4753504 29/01/13 16:28  D
4766622 29/01/13 16:28  R
4766622 31/01/13 9:40   R

我怎样才能在R？

中这样做

Answer 1

为什么不使用duplicated？

within(df, {
  Type <- rep("R", nrow(df))
  Type[duplicated(Date.Time) | 
         duplicated(Date.Time, fromLast=TRUE)] <- "D"
})
#    Sample      Date.Time Type
# 1 4753432 24/01/13 10:20    D
# 2 4753432 24/01/13 10:20    D
# 3 4753441 24/01/13 11:23    R
# 4 4753441 25/01/13 10:44    D
# 5 4753441 25/01/13 10:44    D
# 6 4753504 25/01/13 16:46    R
# 7 4753504 29/01/13 16:28    D
# 8 4766622 29/01/13 16:28    D
# 9 4766622  31/01/13 9:40    R

或者我错过了什么？

Answer 2

假设您有一个包含数据并具有列名的data.frame datfrm

colnames(datfrm) <- c("Sample","Date-Time","Type")

我进一步假设您将日期和时间存储在单个字符串中，就像在示例中一样

datfrm[1,"Date-Time"]
# "24/01/13 10:20"

因为，根据你的例子，你只是判断日期而不是时间是否重复，提取日期：

datestr <- substring(datfrm[,"Date-Time"],1,8)

如果位置i或TRUE的元素具有相同的日期，请创建一个逻辑变量，其位置i+1的元素为i-1。

dupflag1 <- c(datestr[1:(length(datestr)-1)] == datestr[-1], FALSE)
dupflag2 <- c(FALSE, datestr[1:(length(datestr)-1)] == datestr[-1])
dupflag <- dupflag1 | dupflag2

现在相应地更新Type列

datfrm[,"Type"] <- ifelse(dupflag,"D","R")

就是这样。基本上，如果变量只能采用两个可能的值，您可能更喜欢使用R中的逻辑类型：

datfrm[,"Type"] <- dupflag

区分重复和重复

2 个答案: