Question

我有以下代码可以很好地删除Python列表中元素末尾的字符：

x = ['01/01/2013 00:00:00','01/01/2013 00:00:00',
    '01/01/2013 00:00:00','01/01/2013 00:00:00',...]

假设该数组，我想删除00:00:00部分。所以，我写了这个：

i = 0
while i < len(x):
    x[i] = x[i][:x[i].find(' 00:00:00')]
    i += 1

这就是诀窍。如何在R中实现类似的解决方案？我已经尝试了substr和gsub，但它们的运行速度非常慢（实际列表中有超过250,000个日期/时间组合）。

Answer 1

尝试

x <- rep('01/01/2013 00:00:00', 250000)
system.time(y <- sub(" 00:00:00", "", x, fixed=TRUE))
# User      System verstrichen 
# 0.05        0.00        0.05

y包含结果。时间表明它不应该花太长时间。有关参数的帮助，请参阅?sub。

Answer 2

考虑一些示例数据：

set.seed(144)
dat <- sample(c("01/01/2013 00:00:00", "01/01/2013 12:34:56"), 200000, replace=T)
table(dat)
# dat
# 01/01/2013 00:00:00 01/01/2013 12:34:56 
#              100100               99900

在这里，我们要删除尾随00:00:00，但保持尾随12:34:56。

您可以首先在字符串末尾找到00:00:00，并使用以下内容（在我的计算机上运行~0.1秒）：

to.clean <- grepl(" 00:00:00$", dat)

现在您可以使用substr删除相关的尾随字符（在我的计算机上运行约0.04秒）：

dat[to.clean] <- substr(dat[to.clean], 1, nchar(dat[to.clean])-9)
table(dat)
# dat
#          01/01/2013 01/01/2013 12:34:56 
#              100100               99900

或者，对于这200,000个日期/时间对，以下更紧凑的gsub命令也会在大约0.15秒内运行：

cleaned <- gsub(" 00:00:00$", "", dat)
table(cleaned)
# cleaned
#          01/01/2013 01/01/2013 12:34:56 
#              100100               99900

您可能正在遍历数据并在向量的每个单独元素上单独调用substr或gsub，这肯定会慢得多，因为它没有＆＃39; t利用矢量化。

R中这个字符串替换代码的等价物？

2 个答案: