我正在尝试将一个numpy数组分解成具有固定大小的块,并用0填充最后一个。例如:将[1,2,3,4,5,6,7]
分解成3
的块将返回[[1,2,3],[4,5,6],[7,0,0]]
。>
我写的函数是:
def makechunk(lst, chunk):
result = []
for i in np.arange(0, len(lst), chunk):
temp = lst[i:i + chunk]
if len(temp) < chunk:
temp = np.pad(temp, (0, chunk - len(temp)), 'constant')
result.append(temp)
return result
它可以工作,但是在处理大尺寸数组时非常慢。还有什么numpy-ish
和向量化的方式呢?
答案 0 :(得分:3)
使用功能resize()应该可以满足您的需求:
<?php
/**
* RaeCreated by Homensdigiworld.
* Admin: Stefaan
* Date: 26-03-2019
*/
include_once "app_opslag_database_tl_gl.php";
use Box\Spout\Reader\ReaderFactory;
use Box\Spout\Common\Type;
require_once ('connect.php');
require_once ('Spout/Autoloader/autoload.php');
if(!empty($_FILES['file-7']['name']))
{
// Get File extension eg. 'xlsx' to check file is excel sheet
$pathinfo = pathinfo($_FILES['file-7']['name']);
// check file has extension xlsx, xls and also check
// file is not empty
if (($pathinfo['extension'] == 'xlsx' || $pathinfo['extension'] == 'xls')
&& $_FILES['file-7']['size'] > 0 )
{
$file = $_FILES['file-7']['tmp_name'];
// Read excel file by using ReadFactory object.
$reader = ReaderFactory::create(Type::XLSX);
// Open file
$reader->open($file);
$count = 0;
// Number of sheet in excel file
foreach ($reader->getSheetIterator() as $sheet)
{
// Number of Rows in Excel sheet
foreach ($sheet->getRowIterator() as $row)
{
// It reads data after header. In the my excel sheet,
// header is in the first row.
if ($count > 0) {
// Data of excel sheet
$MBU = $row[1];
$zone = $row[2];
$omschr = $row[3];
$tl = $row[4];
$gl = $row[5];
$stand_in_lijn = $row[6];
$station = $row[7];
$MBU_nr = $row[8];
$WO_stap = $row[9];
$LOG_stap = $row[10];
$Q_stap = $row[11];
$RA_stand = $row[12];
//Here, You can insert data into database.
$qry = "INSERT INTO `users`(`MBU`, `zone`, `omschr`, `tl`, `gl`, `stand_in_lijn`, `station`, `MBU_nr`, `WO_stap`, `LOG_stap`, `Q_stap`, `RA_stand`) VALUES ('$MBU','$zone','$omschr','$tl','$gl','$stand_in_lijn','$station','$MBU_nr','$WO_stap','$LOG_stap','$Q_stap','$RA_stand')";
$res = mysqli_query($con,$qry);
}
$count++;
}
}
if($res)
{
echo "Your file Uploaded Successfull";
}
else
{
echo "Your file Uploaded Failed";
}
// Close excel file
$reader->close();
}
else
{
echo "Please Choose only Excel file";
}
}
else
{
echo "File is Empty"."<br>";
echo "Please Choose Excel file";
}
?>
(编辑:mea culpa,星期一的重新分配问题)
@J: 调整大小可使np.arange(0,44100)的速度提高大约5倍,变为512个块。
l = np.array([1,2,3,4,5,6,7])
l.resize((3,3), refcheck=False)
答案 1 :(得分:3)
使用我最初考虑的另一种numpy
方法(将@Cedric Poulet's(所有的荣誉给他,请看他的答案)与其他解决方法(添加数组拆分,以便按需要返回结果)进行时间比较零数组并就地插入数据):
import time
import numpy as np
def time_measure(func):
def wrapper(*args, **kwargs):
start = time.time()
result = func(*args, **kwargs)
stop = time.time()
print(f"Elapsed time: {stop-start}")
return result
return wrapper
@time_measure
def pad_and_chunk(array, chunk_size: int):
padded_array = np.zeros(len(array) + (chunk_size - len(array) % chunk_size))
padded_array[: len(array)] = array
return np.split(padded_array, len(padded_array) / chunk_size)
@time_measure
def resize(array, chunk_size: int):
array.resize(len(array) + (chunk_size - len(array) % chunk_size), refcheck=False)
return np.split(array, len(array) / chunk_size)
@time_measure
def makechunk4(l, chunk):
l.resize((math.ceil(l.shape[0] / chunk), chunk), refcheck=False)
return l.reshape(chunk, -1)
if __name__ == "__main__":
array = np.random.rand(1_000_000)
ret = pad_and_chunk(array, 3)
ret = resize(array, 3)
ret = makechunk4(array, 3)
收集所有可能的答案的确是np.split
与重塑相比非常慢。
Elapsed time: 0.3276541233062744
Elapsed time: 0.3169224262237549
Elapsed time: 1.8835067749023438e-05
填充数据的方式不是必不可少的,大部分时间都是拆分操作。
答案 2 :(得分:0)
itertools
recipes中有一个grouper
的食谱:
from itertools import zip_longest
import numpy as np
array = np.array([1,2,3,4,5,6,7])
def grouper(iterable, n, fillvalue=None):
args = [iter(iterable)] * n
return zip_longest(*args, fillvalue=fillvalue)
res = list(grouper(array, 3, fillvalue=0))
# [(1, 2, 3), (4, 5, 6), (7, 0, 0)]
如果您需要子列表为list
,而不是tuple
:
def grouper(iterable, n, fillvalue=None):
args = [iter(iterable)] * n
return (list(item) for item in zip_longest(*args, fillvalue=fillvalue))
答案 3 :(得分:-2)
使用numpy
我假设块大小为3,并在x中创建了长度为10的随机数组输入。
# Chunk size
chunk = 3
# Create array
x = np.arange(10)
首先请确保使用零填充数组。接下来,您可以使用reshape
创建一个数组数组。
# Pad array
x = np.pad(x, (0, chunk - (x.shape[0]%chunk)), 'constant')
# Divide into chunks
x = x.reshape(-1, chunk)
(可选)您可以将numpy数组作为列表检索
x = x.tolist()