Question

我正在尝试将一个numpy数组分解成具有固定大小的块，并用0填充最后一个。例如：将[1,2,3,4,5,6,7]分解成3的块将返回[[1,2,3],[4,5,6],[7,0,0]]。

我写的函数是：

def makechunk(lst, chunk):
    result = []
    for i in np.arange(0, len(lst), chunk):
        temp = lst[i:i + chunk]
        if len(temp) < chunk:
            temp = np.pad(temp, (0, chunk - len(temp)), 'constant')
        result.append(temp)
    return result

它可以工作，但是在处理大尺寸数组时非常慢。还有什么numpy-ish和向量化的方式呢？

Answer 1

使用功能resize（）应该可以满足您的需求：

<?php
/**
 * RaeCreated by Homensdigiworld.
 * Admin: Stefaan
 * Date: 26-03-2019
 */
include_once "app_opslag_database_tl_gl.php";
use Box\Spout\Reader\ReaderFactory;
use Box\Spout\Common\Type;

require_once ('connect.php');
require_once ('Spout/Autoloader/autoload.php');

if(!empty($_FILES['file-7']['name']))
{
    // Get File extension eg. 'xlsx' to check file is excel sheet
    $pathinfo = pathinfo($_FILES['file-7']['name']);

    // check file has extension xlsx, xls and also check
    // file is not empty
    if (($pathinfo['extension'] == 'xlsx' || $pathinfo['extension'] == 'xls')
        && $_FILES['file-7']['size'] > 0 )
    {
        $file = $_FILES['file-7']['tmp_name'];

        // Read excel file by using ReadFactory object.
        $reader = ReaderFactory::create(Type::XLSX);

        // Open file
        $reader->open($file);
        $count = 0;

        // Number of sheet in excel file
        foreach ($reader->getSheetIterator() as $sheet)
        {

            // Number of Rows in Excel sheet
            foreach ($sheet->getRowIterator() as $row)
            {

                // It reads data after header. In the my excel sheet,
                // header is in the first row.
                if ($count > 0) {

                    // Data of excel sheet
                    $MBU = $row[1];
                    $zone = $row[2];
                    $omschr = $row[3];
                    $tl = $row[4];
                    $gl = $row[5];
                    $stand_in_lijn = $row[6];
                    $station = $row[7];
                    $MBU_nr = $row[8];
                    $WO_stap = $row[9];
                    $LOG_stap = $row[10];
                    $Q_stap = $row[11];
                    $RA_stand = $row[12];

                    //Here, You can insert data into database.
                    $qry = "INSERT INTO `users`(`MBU`, `zone`, `omschr`, `tl`, `gl`, `stand_in_lijn`, `station`, `MBU_nr`, `WO_stap`, `LOG_stap`, `Q_stap`, `RA_stand`) VALUES ('$MBU','$zone','$omschr','$tl','$gl','$stand_in_lijn','$station','$MBU_nr','$WO_stap','$LOG_stap','$Q_stap','$RA_stand')";
                    $res = mysqli_query($con,$qry);

                }
                $count++;
            }
        }

        if($res)
        {

            echo "Your file Uploaded Successfull";
        }
        else
        {
            echo "Your file Uploaded Failed";
        }

        // Close excel file
        $reader->close();
    }
    else
    {
        echo "Please Choose only Excel file";
    }
}
else
{
    echo "File is Empty"."<br>";
    echo "Please Choose Excel file";
}

?>

（编辑：mea culpa，星期一的重新分配问题）

@J：调整大小可使np.arange（0,44100）的速度提高大约5倍，变为512个块。

l = np.array([1,2,3,4,5,6,7])
l.resize((3,3), refcheck=False)

Answer 2

使用我最初考虑的另一种numpy方法（将@Cedric Poulet's（所有的荣誉给他，请看他的答案）与其他解决方法（添加数组拆分，以便按需要返回结果）进行时间比较零数组并就地插入数据）：

import time

import numpy as np

def time_measure(func):
    def wrapper(*args, **kwargs):
        start = time.time()
        result = func(*args, **kwargs)
        stop = time.time()
        print(f"Elapsed time: {stop-start}")
        return result

    return wrapper


@time_measure
def pad_and_chunk(array, chunk_size: int):
    padded_array = np.zeros(len(array) + (chunk_size - len(array) % chunk_size))
    padded_array[: len(array)] = array
    return np.split(padded_array, len(padded_array) / chunk_size)


@time_measure
def resize(array, chunk_size: int):
    array.resize(len(array) + (chunk_size - len(array) % chunk_size), refcheck=False)
    return np.split(array, len(array) / chunk_size)

@time_measure
def makechunk4(l, chunk):
    l.resize((math.ceil(l.shape[0] / chunk), chunk), refcheck=False)
    return l.reshape(chunk, -1)


if __name__ == "__main__":
    array = np.random.rand(1_000_000)

    ret = pad_and_chunk(array, 3)
    ret = resize(array, 3)
    ret = makechunk4(array, 3)

编辑-编辑

收集所有可能的答案的确是np.split与重塑相比非常慢。

Elapsed time: 0.3276541233062744
Elapsed time: 0.3169224262237549
Elapsed time: 1.8835067749023438e-05

填充数据的方式不是必不可少的，大部分时间都是拆分操作。

Answer 3

itertools recipes中有一个grouper的食谱：

from itertools import zip_longest
import numpy as np

array = np.array([1,2,3,4,5,6,7])

def grouper(iterable, n, fillvalue=None):
    args = [iter(iterable)] * n
    return zip_longest(*args, fillvalue=fillvalue)

res = list(grouper(array, 3, fillvalue=0))
# [(1, 2, 3), (4, 5, 6), (7, 0, 0)]

如果您需要子列表为list，而不是tuple：

def grouper(iterable, n, fillvalue=None):
    args = [iter(iterable)] * n
    return (list(item) for item in zip_longest(*args, fillvalue=fillvalue))

Answer 4

使用numpy

的解决方案

我假设块大小为3，并在x中创建了长度为10的随机数组输入。

# Chunk size
chunk = 3
# Create array
x = np.arange(10)

首先请确保使用零填充数组。接下来，您可以使用reshape创建一个数组数组。

# Pad array
x = np.pad(x, (0, chunk - (x.shape[0]%chunk)), 'constant')
# Divide into chunks
x = x.reshape(-1, chunk)

（可选）您可以将numpy数组作为列表检索

x = x.tolist()

脾气暴躁，如何将列表分成多个块

4 个答案:

编辑-编辑