如何聚合NumPy记录数组(总和,最小,最大等)?

时间:2015-10-09 01:06:12

标签: python numpy aggregate-functions aggregate recarray

考虑一个简单的记录数组结构:

import numpy as np
ijv_dtype = [
    ('I', 'i'),
    ('J', 'i'),
    ('v', 'd'),
]
ijv = np.array([
    (0, 0, 3.3),
    (0, 1, 1.1),
    (0, 1, 4.4),
    (1, 1, 2.2),
    ], ijv_dtype)
print(ijv)  # [(0, 0, 3.3) (0, 1, 1.1) (0, 1, 4.4) (1, 1, 2.2)]

我想通过对vI的唯一组合进行分组,J select i, j, sum(v) as v from ijv group by i, j; i | j | v ---+---+----- 0 | 0 | 3.3 0 | 1 | 5.5 1 | 1 | 2.2 来自# Get unique groups, index and inverse u_ij, idx_ij, inv_ij = np.unique(ijv[['I', 'J']], return_index=True, return_inverse=True) # Assemble aggregate a_ijv = np.zeros(len(u_ij), ijv_dtype) a_ijv['I'] = u_ij['I'] a_ijv['J'] = u_ij['J'] a_ijv['v'] = [ijv['v'][inv_ij == i].sum() for i in range(len(u_ij))] print(a_ijv) # [(0, 0, 3.3) (0, 1, 5.5) (1, 1, 2.2)] 的某些统计信息(总和,最小值,最大值等)。从SQL思考,预期结果是:

import java.util.Scanner;
import java.io.File;
import java.io.FileNotFoundException;
import java.io.FileOutputStream;
import java.io.PrintStream;

public class Project03 {

    public static void main(String[] args) throws FileNotFoundException {
        CaesarCipher CaesarCipher = new CaesarCipher("", 0);
        Scanner choice = new Scanner(System.in);
        Scanner intoff = new Scanner(System.in);
        Scanner output = new Scanner(System.in);
        System.out.println("Type E to encrypt a file, or D to decrypt a file");
        String pick = choice.nextLine();
        if (pick.toLowerCase().equals("e")) {
            System.out.println("Enter the file path of the text you'd like to encrypt: ");
            File file = new File(choice.nextLine());
            Scanner textfile = new Scanner(file);
            String line = textfile.nextLine();
            System.out.println("Enter the offset you would like to use (must be 1-25)");
            int offset = intoff.nextInt();
            System.out.println("Name the file you would like to output to");
            String TextOutput = output.nextLine();
            System.out.println(CaesarCipher.encode(line, offset));
            PrintStream out = new PrintStream(new FileOutputStream(TextOutput));
            System.setOut(out);
        } else if (pick.toLowerCase().equals("d")) {
            System.out.println("Enter the file path of the text you'd like to decrypt: ");
            File file = new File(choice.nextLine());
            Scanner textfile = new Scanner(file);
            String line = textfile.nextLine();
            System.out.println("Enter the offset you would like to use (must be 1-25)");
            int offset = choice.nextInt();
            System.out.println("Name the file you would like to output to");
            String TextOutput = output.nextLine();
            System.out.println(CaesarCipher.decode(line, offset));
            PrintStream out = new PrintStream(new FileOutputStream(TextOutput));
            System.setOut(out);
        } else {
            System.out.println("Something went Wrong");
        }
    }
}

(顺序并不重要)

我能想到的最好的NumPy是丑陋的,我不相信我已经正确地订购了结果(虽然它似乎在这里工作):

@Inject

我想有更好的方法来做到这一点!我正在使用NumPy 1.4.1。

2 个答案:

答案 0 :(得分:1)

对于像这样的任务,

numpy有点太低了。我认为你的解决方案很好,如果你必须使用纯numpy,但如果你不介意使用具有更高抽象级别的东西,试试pandas

import pandas as pd

df = pd.DataFrame({
    'I': (0, 0, 0, 1),
    'J': (0, 1, 1, 1),
    'v': (3.3, 1.1, 4.4, 2.2)})

print(df)
print(df.groupby(['I', 'J']).sum())

输出:

   I  J    v
0  0  0  3.3
1  0  1  1.1
2  0  1  4.4
3  1  1  2.2
       v
I J     
0 0  3.3
  1  5.5
1 1  2.2

答案 1 :(得分:1)

与您已有的内容相比,这并不是一个很大的进步,但至少可以摆脱for循环。

# Starting with your original setup

# Get the unique ij values and the mapping from ungrouped to grouped.
u_ij, inv_ij = np.unique(ijv[['I', 'J']], return_inverse=True)

# Create a totals array. You could do the fancy ijv_dtype thing if you wanted.
totals = np.zeros_like(u_ij.shape)

# Here's the magic bit. You can think of it as 
# totals[inv_ij] += ijv["v"] 
# except the above doesn't behave as expected sadly.
np.add.at(totals, inv_ij, ijv["v"])

print(totals)

您正在使用numpy的multi-dtype东西这一事实有点说明您应该使用熊猫。尝试将ijv保持在一起时,通常会减少代码的麻烦。