Question

我试图完成以下任务并陷入困境：

将csv文件导入numpy数组
在numpy数组的列上进行迭代，并为每列提供一个值数组
将其传递给函数

我目前有：

def csv_to_array(file):
    # Open the file, and load it in delimiting on the ',' for a comma separated value file
    data = open(file, 'r')
    data = numpy.loadtxt(data, delimiter=',')

    # Loop through the data in the array
    for index in range(len(data)):
        # Utilize a try catch to try and convert to float, if it can't convert to float, converts to 0
        try:
            data[index] = [float(x) for x in data[index]]
        except ValueError:
            data[index] = 0

    # Return the now type-formatted data
    randomize_data(data)
    return data

def randomize_data(csv):
    csv = numpy.random.shuffle(csv)
    return csv

def main():
    test = csv_to_array('ss.csv') 
    features = numpy.asarray(test.tolist()[:-1])
    # for column in features.T:
    #     print("BREAK")
    #     print(column)
    #     currPerf = k_means(column,3)

main()

因此，当我致电test=csv_to_array('ss.csv')时。对于它的价值，ss.csv是虹膜数据集，每个类都被0、1或2（我稍后将其删除）替换。我得到以下信息：

[[5.1 3.5 1.4 0.2 0. ]
 [4.9 3.  1.4 0.2 0. ]
 [4.7 3.2 1.3 0.2 0. ]
 [4.6 3.1 1.5 0.2 0. ]
 [5.  3.6 1.4 0.2 0. ]
 [5.4 3.9 1.7 0.4 0. ]
 [4.6 3.4 1.4 0.3 0. ]
 [5.  3.4 1.5 0.2 0. ]
 [4.4 2.9 1.4 0.2 0. ]
 [4.9 3.1 1.5 0.1 0. ]
 [5.4 3.7 1.5 0.2 0. ]
 [4.8 3.4 1.6 0.2 0. ]
 [4.8 3.  1.4 0.1 0. ]
 [4.3 3.  1.1 0.1 0. ]
 [5.8 4.  1.2 0.2 0. ]
 [5.7 4.4 1.5 0.4 0. ]
...]

我想要做的是能够创建一个变量，例如test_columns，并遍历上面的numpy数组，然后一个个地追加到test_columns

So Iteration 1: <br />
`test_columns = 
[5.1]
[4.9]
[4.7]
[4.6]
[5. ]
...]
`

Iteration 2: <br />
`test_columns = 
[5.1 3.5]
[4.9 3. ]
[4.7 3.2 ]
[4.6 1.3 ]
[5.  3.6 ]
...]
`

Iteration 3: <br />
`test_columns = 
[5.1 3.5 1.4]
[4.9 3. 1.4]
[4.7 3.2 1.3]
[4.6 1.3 1.5]
[5.  3.6 1.4]
...]
`

等等我如何遍历一个numpy数组，一次一列，追加到新的numpy数组。新的numpy数组将在另一个函数中求值。

我尝试for column in features.T来转置数组，但没有得到预期的结果。

感谢您的帮助。

Answer 1

numpy可以通过numpy.hsplit(array, column)执行此操作。完整文档位于：https://docs.scipy.org/doc/numpy/reference/generated/numpy.hsplit.html，但是本质上在迭代1中，您将使用：test_columns = numpy.hsplit(test, 1)，然后是下一个迭代：test_columns = numpy.hsplit(test, 2)，依此类推。根据您的需要，您可能需要执行另一索引步骤它如何返回数组（无论如何我都不是专家），但是我非常有信心这是您正在寻找的方法！

@EDIT

这是一个示例代码，其中我将数组拆分为列：

将numpy导入为np

my_array = np.array([
    [5.1, 3.5, 1.4, 0.2, 0.],
    [4.9, 3.,  1.4, 0.2, 0.],
    [4.7, 3.2, 1.3, 0.2, 0.],
    [4.6, 3.1, 1.5, 0.2, 0.],
    [5.,  3.6, 1.4, 0.2, 0.],
    [5.4, 3.9, 1.7, 0.4, 0.],
    [4.6, 3.4, 1.4, 0.3, 0.],
    [5.,  3.4, 1.5, 0.2, 0.],
    [4.4, 2.9, 1.4, 0.2, 0.]])

test1 = np.hsplit(my_array, np.array([1, 1]))[0]
print(test1)

打印[[5.1] [4.9] [4.7] [4.6] [5。 ] [5.4] [4.6] [5。 ] [4.4]]

您可以将其更改为test1 = np.hsplit(my_array, np.array([2, 1]))[0]以正确切片前两列。看来第二个参数作为numpy数组比int()更好，并且您确实想对返回的第[0]个元素进行切片以使其正常工作，因为它将返回一些垃圾阵列以及出于您的目的也应忽略的容器。

要在数据集上自动实现此功能，您可以将上面的最后两行代码替换为：

columns = my_array.shape[1]

for column_index in range(1, columns + 1):
    test = np.hsplit(my_array, np.array([column_index, 1]))[0]
    print(test)

使用Numpy遍历列并提要功能

1 个答案: