小块聚合成箱,然后计算总和?

时间:2018-10-23 15:53:11

标签: python python-3.x pandas numpy

我有一个看起来像这样的矩阵:

M = [[1, 200],
 [1.8, 100],
 [2, 500],
 [2.5, 300],
 [3, 400],
 [3.5, 200],
 [5, 200],
 [8, 100]]

我想按bin大小(应用于左侧列)对行进行分组,例如对于bin大小2(第一个bin的值是0-2,第二个bin的值是2-4,第三个bin的值是4-6,依此类推):

[[1, 200],
 [1.8, 100],
----
 [2, 500],
 [2.5, 300],
 [3, 400],
 [3.5, 200],
----
 [5, 200],
----
 [8, 100]]

然后输出一个新矩阵,其中包含每组右列的总和:

[200+100, 500+300+400+200, 200, 100]

基于bin_size边界求和每个值的有效方法是什么?

2 个答案:

答案 0 :(得分:4)

使用apply plugin: "com.android.application" apply plugin: 'kotlin-android' apply plugin: "spoon" apply plugin: "dexguard" apply plugin: "io.fabric" apply plugin: "jacoco" apply plugin: 'realm-android' apply plugin: 'kotlin-kapt' apply plugin: 'kotlin-android-extensions' def fullVersion if (project.hasProperty("realVersionName") && project.realVersionName) { // versionName generated by the Jenkinsfile fullVersion = project.realVersionName } else { // Fallback for local builds fullVersion = projectVersion + ".1" } def versionCodeNr = Integer.parseInt(projectVersionCode) androidExtensions { experimental = true } android { compileSdkVersion 28 buildToolsVersion '28.0.3' defaultConfig { minSdkVersion 21 targetSdkVersion 28 versionCode versionCodeNr versionName fullVersion applicationId projectApplicationId renderscriptTargetApi 28 renderscriptSupportModeEnabled true testInstrumentationRunner "android.support.test.runner.AndroidJUnitRunner" ndk { abiFilters "armeabi-v7a", "x86", "mips" } } signingConfigs { debug { storeFile file("../keystore/debug.keystore") } release { storeFile file("../keystore/release.keystore") storePassword RELEASE_KEYSTORE_STORE_PASSWORD keyAlias RELEASE_KEYSTORE_KEY_ALIAS keyPassword RELEASE_KEYSTORE_KEY_PASSWORD } } buildTypes { debug { signingConfig signingConfigs.debug ext.enableCrashlytics = false } report.initWith(buildTypes.debug) report { testCoverageEnabled true matchingFallbacks = ['debug'] } client { debuggable false signingConfig signingConfigs.debug proguardFile getDefaultDexGuardFile("...") ... } release { signingConfig signingConfigs.release ... } } testOptions { unitTests { includeAndroidResources true } } compileOptions { targetCompatibility 1.8 sourceCompatibility 1.8 } } ext { supportLibVersion = "27.1.1" glideVersion = "4.7.1" daggerVersion = "2.13" ... } dependencies { // Google dependencies implementation 'androidx.core:core-ktx:1.0.0' implementation "com.android.support:support-v4:$supportLibVersion" implementation "com.android.support:recyclerview-v7:$supportLibVersion" implementation "com.android.support:design:$supportLibVersion" implementation "com.android.support:appcompat-v7:$supportLibVersion" implementation "com.android.support:gridlayout-v7:$supportLibVersion" implementation "com.android.support.constraint:constraint-layout:1.1.3" implementation "com.google.firebase:firebase-core:16.0.1" implementation "com.google.firebase:firebase-messaging:17.1.0" //noinspection GradleDependency implementation "com.google.code.gson:gson:2.8.5" // Glide implementation "com.github.bumptech.glide:glide:$glideVersion" implementation "com.github.bumptech.glide:annotations:$glideVersion" //kapt "com.github.bumptech.glide:compiler:$glideVersion" // Dagger implementation "com.google.dagger:dagger:$daggerVersion" kapt "com.google.dagger:dagger-compiler:$daggerVersion" } apply plugin: "com.google.gms.google-services"

制作一个pandas,然后使用整数除法来定义垃圾箱:

DataFrame

使用import pandas as pd df = pd.DataFrame(M) df.groupby(df[0]//2)[1].sum() #0 #0.0 300 #1.0 1400 #2.0 200 #4.0 100 #Name: 1, dtype: int64 获得所需的输出:

.tolist()

使用df.groupby(df[0]//2)[1].sum().tolist() #[300, 1400, 200, 100]

numpy.bincount

答案 1 :(得分:2)

您可以在这里使用np.digitizescipy.sparse.csr_matrix

bins = [2, 4, 6, 8, 10]
b = np.digitize(M[:, 0], bins)
v = M[:, 1]

现在使用矢量groupbycsr_matrix

from scipy import sparse

sparse.csr_matrix(
    (v, b, np.arange(v.shape[0]+1)), (v.shape[0], b.max()+1)
).sum(0)

matrix([[ 300., 1400.,  200.,    0.,  100.]])