Question

我有一个数据集，其中时间戳是以纪元为单位的秒数：

   id      event       time       
2 722     opened 1356931342
1 723     opened 1356963741
4 721 referenced 1356988186
5 721     closed 1356988186
3 721 referenced 1356988206

但是，因为处理大量非常长的时间戳会对我正在使用的算法（最佳匹配距离）造成严重的性能问题，所以我想将其简化为首先（或者同一时间）。我的意思是数据集中最早的事件（行）应该是1，然后是2,3,4等。如果两行具有完全相同的数字（自纪元以来的秒数），则需要给它们相同新的缩小格式的数字。因此，这需要输出以下内容：

   id      event       time       
2 722     opened       1
1 723     opened       2
4 721 referenced       3
5 721     closed       3
3 721 referenced       4

“时间”列本质上是数字的向量（不是因素 - 因为我试图解决性能问题，所以这不起作用）。

我可以使用以下方式订购数据框：

df <- df[with(df, order(time)), ]

但是，如何用有序的单个数字替换数字（相同的时间戳相同的数字）？

Answer 1

使用因素：

df2 <- transform(df, time_f = as.numeric(factor(time)))

Answer 2

我将使用match和unique以下列方式创建integer向量，除非您有特定理由要求将时间列作为factor变量...

df$newtime <- match( df$time , unique( df$time ) )
#   id      event       time newtime
#2 722     opened 1356931342       1
#1 723     opened 1356963741       2
#4 721 referenced 1356988186       3
#5 721     closed 1356988186       3
#3 721 referenced 1356988206       4

factor的代码无论如何都会使用match和unique。

使用R中的有序数字替换自纪元以来的秒数

2 个答案: