我有一个数据集,可以使用下面的代码生成
df <- data.frame(Cust = c("Name1","Name1","Name1","Name1","Name1","Name2","Name2","Name2","Name2","Name2","Name2","Name2"),
Loc=c("Code1", "Code1", "Code2", "Code2", "Code3","Code1","Code1","Code1","Code1","Code1","Code2","Code2"),
Date = c("Date1","Date1","Date2","Date2","Date2","Date1","Date1","Date2","Date2","Date3","Date2","Date2"),
Var1 = c("a","b","c","d","e","f","g","p", "q", "h", "i", "j"),
Var2 = c("r", "s", "u", "v", "w", "x","y", "a", "b", "z", "q", "p") )
按Cust,Loc和Date排序。添加cols Var1和Var2只是为了表明数据集中还有其他列。我需要从数据集中为Cust,Loc和Date的每个唯一组合提取一行,并创建提取行的新数据集。结果如下所示
我可以通过首先按Cust,Loc和Date排序然后逐行扫描整个数据集并在Cust或Loc或Date更改时提取行来实现。但实际数据集超过1200万行,而且需要很长时间。