我在R中遇到问题。数据集包含9个分类变量,因此我必须操纵数字变量才能将分类变量转换为数字。我想要构建的新变量看起来像这样,例如,过去7天内该记录中同一SSN的不同号码家庭电话。我写的部分代码如下。我花了2个小时来计算100k数据集中只有30%的记录。我如何优化此代码以提高效率?谢谢。
原始变量: 记录日期ssn firstname姓氏地址zip dob homephone
### Data sample[enter image description here][1]
|record | date | ssn |homephone ||date1 |dscnt|
|52 | 20150101 |737610282 |9105580920 |20150101 | 1 |
|91 | 20150101 |737610282 |3301371772 |20150101 | 2 |
|144 | 20150101 |737610282 |4139873682 |20150101 | 3 |
|236 | 20150101 |737610282 |4345558469 |20150101 | 4 |
|272 | 20150101 |737610282 |3979419673 |20150101 | 5 |
|282 | 20150102 |737610282 |8261221737 |20150102 | 6 |
|330 | 20150102 |737610282 |9105580920 |20150102 | 6 |
|412 | 20150102 |737610282 |7115050432 |20150102 | 7 |
|493 | 20150102 |737610282 |6872924382 |20150102 | 8 |
|571 | 20150103 |737610282 |2528003837 |20150103 | 9 |
|621 | 20150103 |737610282 |9105580920 |20150103 | 9 |
|763 | 20150103 |737610282 |581105006 |20150103 | 10 |
|854 | 20150104 |737610282 |9105580920 |20150104 | 10 |
|883 | 20150104 |737610282 |979778244 |20150104 | 11 |
|966 | 20150104 |737610282 |9952278232 |20150104 | 12 |
|974 | 20150104 |737610282 |3664924933 |20150104 | 13 |
|1057 | 20150104 |737610282 |9105580920 |20150104 | 13 |
|1091 | 20150104 |737610282 |32384275 |20150104 | 14 |
|1166 | 20150105 |737610282 |942052630 |20150105 | 15 |
|1304 | 20150105 |737610282 |9105580920 |20150105 | 15 |
|1416 | 20150105 |737610282 |1384461737 |20150105 | 16 |
i= 1
e1 = NULL
for (i in 1:365)
{
i = 360
a1 <- df0 %>% filter(date <= DT[365-i,])
b1 <-
a1 %>%
select(record,date,ssn,homephone) %>%
mutate(date1 = date + days(as.integer(today() -max(a$date)) )) %>%
filter(date1 >= today() - days(7)) %>%
group_by(ssn) %>%
mutate(dscnt = n_distinct(homephone))
f1 = b1 %>% filter(date == DT[365-i,])
e1 = rbind(e1,f1)
}