我需要一个有效的data.table解决方案,以针对列的累积总和的每300个仅过滤到第一个和最后一个实例。我真正的数据集是数百万行,所以我没有在寻找循环的解决方案。
#Example data:
dt <- data.table(idcolref=c(1:1000),y=rep(10,1000))
下面是一个执行我想要的操作的示例循环,但是它太慢了,无法用于大型data.table。
###example of a loop that produces the result I want but is too slow
library(foreach)
dt[,grp:=1,]
dt[,cumsum:=0,]
grp <- 1
foreach(a=2:nrow(dt))%do%{
dt[a,"cumsum"]<-dt[a,"y"]+dt[a-1,"cumsum"]
if(dt[a,"cumsum"]>300){
dt[a,"grp"] <- grp
grp <- grp+1
dt[a,"cumsum"]<-0
}else{
dt[a,"grp"]<-dt[a-1,"grp"]
}
}
dt.desired <- foreach(a=2:nrow(dt),.combine=rbind)%do%{
if(dt[a,"grp"]!=dt[a-1,"grp"]){
dt[c(a-1,a),]
}
}
dt.desired <- rbind(dt[1,],dt.desired)
dt.desired <- rbind(dt.desired,dt[nrow(dt),])
如何使用快速向量化的data.table函数获得相同的结果?谢谢!
答案 0 :(得分:3)
我认为我已经正确解释了您的要求:
在这种情况下,您可以在Rcpp
中编写自己的快速“矢量化”函数
library(data.table)
dt <- data.table(x=rep(5,1e7),y=rep(10,1e7))
## adding a row index to keep track of which rows are returned
dt[, id := .I]
library(Rcpp)
cppFunction('Rcpp::NumericVector findGroupRows(Rcpp::NumericVector x) {
int cumsum = 0;
int grpCounter = 0;
size_t n = x.length();
Rcpp::NumericVector groupedCumSum(n);
for ( size_t i = 0; i < n; i++) {
cumsum += x[i];
if (cumsum > 300) {
cumsum = 0;
grpCounter++;
}
groupedCumSum[i] = grpCounter;
}
return groupedCumSum;
}')
dt[, grp := findGroupRows(y)]
dt[ dt[, .I[c(1, .N)], by = grp]$V1]
答案 1 :(得分:2)
仅使用<?xml version="1.0" encoding="utf-8"?>
<manifest xmlns:android="http://schemas.android.com/apk/res/android"
xmlns:tools="http://schemas.android.com/tools"
package="com.application"
android:installLocation="auto">
<uses-permission android:name="android.permission.INTERNET"/>
<application
android:name=".ApplicationClass"
android:allowBackup="true"
android:fullBackupContent="@xml/backup_descriptor"
android:icon="@mipmap/ic_launcher"
android:label="@string/app_name"
android:roundIcon="@mipmap/ic_launcher_round"
android:supportsRtl="true"
android:theme="@style/AppTheme"
tools:ignore="AllowBackup">
<activity
android:name=".SplashActivity"
android:theme="@style/Splash"
android:launchMode="singleTask">
<intent-filter>
<action android:name="android.intent.action.MAIN"/>
<category android:name="android.intent.category.LAUNCHER"/>
</intent-filter>
<intent-filter>
<action android:name="android.intent.action.VIEW"/>
<category android:name="android.intent.category.DEFAULT"/>
<category android:name="android.intent.category.BROWSABLE"/>
<data
android:host="domain.tld"
android:scheme="http"
android:pathPattern="/*"/>
</intent-filter>
<intent-filter>
<action android:name="android.intent.action.VIEW"/>
<category android:name="android.intent.category.DEFAULT"/>
<category android:name="android.intent.category.BROWSABLE"/>
<data
android:host="domain.tld"
android:scheme="https"
android:pathPattern="/*"/>
</intent-filter>
<intent-filter>
<action android:name="android.intent.action.VIEW"/>
<category android:name="android.intent.category.DEFAULT"/>
<category android:name="android.intent.category.BROWSABLE"/>
<data
android:host="www.domain.tld"
android:scheme="http"
android:pathPattern="/*"/>
</intent-filter>
<intent-filter>
<action android:name="android.intent.action.VIEW"/>
<category android:name="android.intent.category.DEFAULT"/>
<category android:name="android.intent.category.BROWSABLE"/>
<data
android:host="www.domain.tld"
android:scheme="https"
android:pathPattern="/*"/>
</intent-filter>
</activity>
<activity
android:name=".SplashActivity2"
android:theme="@style/Splash"
android:launchMode="singleTask">
<intent-filter>
<action android:name="android.intent.action.VIEW"/>
<category android:name="android.intent.category.DEFAULT"/>
<category android:name="android.intent.category.BROWSABLE"/>
<data
android:host="domain.tld"
android:scheme="http"
android:pathPrefix="/folder"/>
</intent-filter>
<intent-filter>
<action android:name="android.intent.action.VIEW"/>
<category android:name="android.intent.category.DEFAULT"/>
<category android:name="android.intent.category.BROWSABLE"/>
<data
android:host="domain.tld"
android:scheme="https"
android:pathPrefix="/folder"/>
</intent-filter>
<intent-filter>
<action android:name="android.intent.action.VIEW"/>
<category android:name="android.intent.category.DEFAULT"/>
<category android:name="android.intent.category.BROWSABLE"/>
<data
android:host="www.domain.tld"
android:scheme="https"
android:pathPrefix="/folder"/>
</intent-filter>
<intent-filter>
<action android:name="android.intent.action.VIEW"/>
<category android:name="android.intent.category.DEFAULT"/>
<category android:name="android.intent.category.BROWSABLE"/>
<data
android:host="www.domain.tld"
android:scheme="http"
android:pathPrefix="/folder"/>
</intent-filter>
</activity>
<activity
android:name=".MainActivity"
android:launchMode="singleTask"/>
<activity
android:name=".UpdateActivity"
android:launchMode="singleTask"/>
<activity
android:name=".ErrorActivity"
android:launchMode="singleTask"/>
</application>
</manifest>
和基本R函数的简单解决方案:
data.table
dt[, grp2 := (cumsum(y) - 1) %/% 300]
# straight forward solution:
dt[, .SD[c(1, .N)], by = "grp"]
# more efficient for large datasets, as suggested by SymbolixAU
dt[ dt[, .I[c(1, .N)], by = "grp"]$V1]
# check if your groups are of the correct size
table(dt[, .N[[1]], by = "grp"]$V1)
是整数除法运算符%/%
是当前.SD
组的子集data.table
是当前行中的行数
子集(与.N
相同))nrow(.SD
确保第一组的大小正确