我有一个数据框,如:
bp <- bp %>% group_by(accountId) %>%
mutate(diff = as.numeric(date - lag(date)))
它有340万行数据:
<HTML>
<body>
<head>
<script language=“javascript”>
var button = document.getElementById('test');
var date = document.getElementById('1');
var contact = document.getElementById('2');
var contacttype = document.getElementById('3');
var os = document.getElementById('4');
var devicetype = document.getElementById('5');
var device = document.getElementById('6');
var reason = document.getElementById('7');
var comments = document.getElementById('8');
button.onclick = function () {
var str = "Date: " + date.value + " " + "Contact: " + contact.value + " " + "Insured or Agent: " + contacttype.value + " " + "Operating System: " + os.value + " " + "Tablet or Phone: " + devicetype.value + " " + "Device Name: " + device.value + " " + "Reason fo Call: " + reason.value + " " + "Additional Comments: " + comments.value;
alert(str);
};
</script>
</head>
<h1> SR Template
</h1>
<label>Date:
<input id="1" />
</label>
<br />
<label>Contact:
<input id="2" />
</label>
<br>
<label>Insured or Agent:
<input id="3" />
</label>
<br>
<label>Operating System:
<input id="4" />
</label>
<br>
<label>Tablet or Phone:
<input id="5" />
</label>
<br>
<label>Device Name:
<input id="6" />
</label>
<br>
<label>Reason for call:
<input id="7" />
</label>
<br>
<label>Additional Comments:
<input id="8" />
</label>
<br />
<button id="test">Test</button>
</body>
</HTML>
我正在尝试使用dplyr计算滞后时间差异,如下所示:
{{1}}
在我的8GB内存macbook上,R崩溃了。在64GB的Linux服务器上,代码将永远存在。有关解决此问题的任何想法吗?
答案 0 :(得分:2)
不知道你的方式出了什么问题,但是date
作为一个正确的Date
对象,一切都在这里很快发生:
重新创建一些数据:
dat <- read.table(text=" date amount accountId type
1 2015-06-11 101.2 1 a
2 2015-06-18 101.2 1 a
3 2015-06-24 101.2 1 b
4 2015-06-11 294.0 2 a
5 2015-06-18 48.0 2 a
6 2015-06-26 10.0 2 b",header=TRUE)
dat$date <- as.Date(dat$date)
然后在3.4M行,1000组上运行一些分析:
set.seed(1)
dat2 <- dat[sample(rownames(dat),3.4e6,replace=TRUE),]
dat2$accountId <- sample(1:1000,3.4e6,replace=TRUE)
nrow(dat2)
#[1] 3400000
length(unique(dat2$accountId))
#[1] 1000
system.time({
dat2 <- dat2 %>% group_by(accountId) %>%
mutate(diff = as.numeric(date - lag(date)))
})
# user system elapsed
# 0.38 0.03 0.40
head(dat2[dat2$accountId==46,])
#Source: local data frame [6 x 6]
#Groups: accountId
#
# date amount accountId type diff
#1 2015-06-24 101.2 46 b NA
#2 2015-06-18 48.0 46 a -6
#3 2015-06-11 294.0 46 a -13
#4 2015-06-18 101.2 46 a 7
#5 2015-06-26 10.0 46 b 2
#6 2015-06-11 294.0 46 a 0