主要问题:
使用np.hstack
,np.column_stack
或np.concatenate(axis=1)
将相同类型和相同尺寸的numpy数组排列在一起。
阐释:
我不明白numpy数组的哪些属性可能会发生变化,导致numpy.hstack
,numpy.column_stack
和numpy.concatenate(axis=1)
无法正常工作。我有一个问题让我的真实程序按列堆叠 - 它只附加到行。有一些numpy数组的属性会导致这是真的吗?它不会抛出错误,只是不会做“正确”或“正常”的行为。
我尝试过一个简单的案例,它可以按照我的预期运作:
input:
a = np.array([['1', '2'], ['3', '4']], dtype=object)
b = np.array([['5', '6'], ['7', '8']], dtype=object)
np.hstack(a, b)
output:
np.array([['1', '2', '5', '6'], ['3', '4', '7', '8']], dtype=object)
我和我想要的完全没问题。
然而,我从我的计划得到的是:
First array:
[['29.8989', '0'] ['29.8659', '-8.54805e-005'] ['29.902', '-0.00015875']
..., ['908.791', '-0.015765'] ['908.073', '-0.0154842'] []]
Second array (to be added on in columns):
[['29.8989', '26.8556'] ['29.8659', '26.7969'] ['29.902', '29.0183'] ...,
['908.791', '943.621'] ['908.073', '940.529'] []]
What should be the two arrays side by side or in columns:
[['29.8989', '0'] ['29.8659', '-8.54805e-005'] ['29.902', '-0.00015875']
..., ['908.791', '943.621'] ['908.073', '940.529'] []]
显然,这不是正确的答案。
创建此问题的模块相当长(我将在底部给出),但这里是一个简化它仍然有效(执行正确的列堆叠),如第一个例子:
import numpy as np
def contiguous_regions(condition):
d = np.diff(condition)
idx, = d.nonzero()
idx += 1
if condition[0]:
idx = np.r_[0, idx]
if condition[-1]:
idx = np.r_[idx, condition.size]
idx.shape = (-1,2)
return idx
def is_number(s):
try:
np.float64(s)
return True
except ValueError:
return False
total_array = np.array([['1', '2'], ['3', '4'], ['strings','here'], ['5', '6'], ['7', '8']], dtype=object)
where_number = np.array(map(is_number, total_array))
contig_ixs = contiguous_regions(where_number)
print contig_ixs
t = tuple(total_array[s[0]:s[1]] for s in contig_ixs)
print t
print np.hstack(t)
它基本上查看了一系列列表,并找到了最长的连续数字集。如果它们具有相同的长度,我想列那些数据集。
以下是提供问题的真实模块:
import numpy as np
def retrieve_XY(file_path):
# XY data is read in from a file in text format
file_data = open(file_path).readlines()
# The list of strings (lines in the file) is made into a list of lists while splitting by whitespace and removing commas
file_data = np.array(map(lambda line: line.rstrip('\n').replace(',',' ').split(), file_data))
# Remove empty lists, make into numpy array
xy_array = np.array(filter(None, column_stacked_data_chain))
# Each line is searched to make sure that all items in the line are a number
where_num = np.array(map(is_number, file_data))
# The data is searched for the longest contiguous chain of numbers
contig = contiguous_regions(where_num)
try:
# Data lengths (number of rows) for each set of data in the file
data_lengths = contig[:,1] - contig[:,0]
# Get the maximum length of data (max number of contiguous rows) in the file
maxs = np.amax(data_lengths)
# Find the indices for where this long list of data is (index within the indices array of the file)
# If there are two equally long lists of data, get both indices
longest_contig_idx = np.where(data_lengths == maxs)
except ValueError:
print 'Problem finding contiguous data'
return np.array([])
###############################################################################################
###############################################################################################
# PROBLEM ORIGINATES HERE
# Starting and stopping indices of the contiguous data are stored
ss = contig[longest_contig_idx]
# The file data with this longest contiguous chain of numbers
# If there are multiple sets of data of the same length, they are added in columns
longest_data_chains = tuple([file_data[i[0]:i[1]] for i in ss])
print "First array:"
print longest_data_chains[0]
print
print "Second array (to be added on in columns):"
print longest_data_chains[1]
column_stacked_data_chain = np.concatenate(longest_data_chains, axis=1)
print
print "What should be the two arrays side by side or in columns:"
print column_stacked_data_chain
###############################################################################################
###############################################################################################
xy = np.array(zip(*xy_array), dtype=float)
return xy
#http://stackoverflow.com/questions/4494404/find-large-number-of-consecutive-values-fulfilling-condition-in-a-numpy-array
def contiguous_regions(condition):
"""Finds contiguous True regions of the boolean array "condition". Returns
a 2D array where the first column is the start index of the region and the
second column is the end index."""
# Find the indicies of changes in "condition"
d = np.diff(condition)
idx, = d.nonzero()
# We need to start things after the change in "condition". Therefore,
# we'll shift the index by 1 to the right.
idx += 1
if condition[0]:
# If the start of condition is True prepend a 0
idx = np.r_[0, idx]
if condition[-1]:
# If the end of condition is True, append the length of the array
idx = np.r_[idx, condition.size] # Edit
# Reshape the result into two columns
idx.shape = (-1,2)
return idx
def is_number(s):
try:
np.float64(s)
return True
except ValueError:
return False
更新
我让它在@hpaulj的帮助下工作。显然,在两种情况下数据结构都像np.array([['1','2'],['3','4']])
这样的事实是不够的,因为我使用的真实案例有一个dtype=object
并且列表中有一些字符串。因此,numpy看到的是1d数组而不是2d数组,这是必需的。
修复此问题的解决方案是将map(float, data)
调用readlines
函数给出的每个列表。
以下是我最终的结果:
import numpy as np
def retrieve_XY(file_path):
# XY data is read in from a file in text format
file_data = open(file_path).readlines()
# The list of strings (lines in the file) is made into a list of lists while splitting by whitespace and removing commas
file_data = map(lambda line: line.rstrip('\n').replace(',',' ').split(), file_data)
# Remove empty lists, make into numpy array
xy_array = np.array(filter(None, file_data))
# Each line is searched to make sure that all items in the line are a number
where_num = np.array(map(is_number, xy_array))
# The data is searched for the longest contiguous chain of numbers
contig = contiguous_regions(where_num)
try:
# Data lengths
data_lengths = contig[:,1] - contig[:,0]
# All maximums in contiguous data
maxs = np.amax(data_lengths)
longest_contig_idx = np.where(data_lengths == maxs)
except ValueError:
print 'Problem finding contiguous data'
return np.array([])
# Starting and stopping indices of the contiguous data are stored
ss = contig[longest_contig_idx]
print ss
# The file data with this longest contiguous chain of numbers
# Float must be cast to each value in the lists of the contiguous data and cast to a numpy array
longest_data_chains = np.array([[map(float, n) for n in xy_array[i[0]:i[1]]] for i in ss])
# If there are multiple sets of data of the same length, they are added in columns
column_stacked_data_chain = np.hstack(longest_data_chains)
xy = np.array(zip(*column_stacked_data_chain), dtype=float)
return xy
#http://stackoverflow.com/questions/4494404/find-large-number-of-consecutive-values-fulfilling-condition-in-a-numpy-array
def contiguous_regions(condition):
"""Finds contiguous True regions of the boolean array "condition". Returns
a 2D array where the first column is the start index of the region and the
second column is the end index."""
# Find the indicies of changes in "condition"
d = np.diff(condition)
idx, = d.nonzero()
# We need to start things after the change in "condition". Therefore,
# we'll shift the index by 1 to the right.
idx += 1
if condition[0]:
# If the start of condition is True prepend a 0
idx = np.r_[0, idx]
if condition[-1]:
# If the end of condition is True, append the length of the array
idx = np.r_[idx, condition.size] # Edit
# Reshape the result into two columns
idx.shape = (-1,2)
return idx
def is_number(s):
try:
np.float64(s)
return True
except ValueError:
return False
此函数现在将接收一个文件并输出其中找到的最长的连续数字类型数据。如果找到多个具有相同长度的数据集,则列将它们堆叠起来。
答案 0 :(得分:1)
这是数组末尾的空列表导致您的问题:
>>> a = np.array([[1, 2], [3, 4]])
>>> b = np.array([[1, 2], [3, 4], []])
>>> a.shape
(2L, 2L)
>>> a.dtype
dtype('int32')
>>> b.shape
(3L,)
>>> b.dtype
dtype('O')
由于最后的那个空列表,它不是创建一个2D数组,而是创建一个1D,每个项目都包含一个两项长列表对象。