使用列表(由列表组成)创建数组但没有展平内部列表 - python

时间:2018-04-02 14:49:44

标签: python arrays list numpy

我正在尝试使用两个列表创建一个数组,其中一个列表包含每个元素的列表。问题是,在第一种情况下,我设法做我想要的,使用np.column_stack,但在第二种情况下,虽然我的初始列表看起来相似(在结构中),但我的列表列表进入扁平化的数组(这是不是我需要的。

我附加两个示例来复制,在第一种情况下,我设法得到一个数组,其中每一行都有一个字符串作为第一个元素,一个列表作为第二个,而在第二个案例中,我得到4列(列表是扁平化的)没有明显的原因。

示例1

temp_list_column1=['St. Raphael',
 'Goppingen',
 'HSG Wetzlar',
 'Huttenberg',
 'Kiel',
 'Stuttgart',
 'Izvidac',
 'Viborg W',
 'Silkeborg-Voel W',
 'Bjerringbro W',
 'Lyngby W',
 'Most W',
 'Ostrava W',
 'Presov W',
 'Slavia Prague W',
 'Dicken',
 'Elbflorenz',
 'Lubeck-Schwartau',
 'HK Ogre/Miandum',
 'Stal Mielec',
 'MKS Perla Lublin W',
 'Koscierzyna W',
 'CS Madeira W',
 'CSM Focsani',
 'CSM Bucuresti',
 'Constanta',
 'Iasi',
 'Suceava',
 'Timisoara',
 'Saratov',
 'Alisa Ufa W',
 'Pozarevac',
 'Nove Zamky',
 'Aranas',
 'Ricoh',
 'H 65 Hoor W',
 'Lugi W',
 'Strands W']

temp_list_column2=[['32', '16', '16'],
 ['32', '16', '16'],
 ['27', '13', '14'],
 ['23', '9', '14'],
 ['29', '14', '15'],
 ['24', '17', '7'],
 ['30', '15', '15'],
 ['26', '12', '14'],
 ['27', '13', '14'],
 ['26'],
 ['18', '9', '9'],
 ['34', '15', '19'],
 ['30', '13', '17'],
 ['31', '13', '18'],
 ['27', '10', '17'],
 ['28', '14', '14'],
 ['24', '14', '10'],
 ['28', '12', '16'],
 ['28', '9', '19'],
 ['22', '13', '9'],
 ['30', '14', '16'],
 ['22', '14', '8'],
 ['17', '8', '9'],
 ['26'],
 ['41', '21', '20'],
 ['36', '18', '18'],
 ['10'],
 ['25', '12', '13'],
 ['27', '16', '11'],
 ['31', '15', '16'],
 ['25', '15', '10'],
 ['24', '8', '16'],
 ['28', '14', '14'],
 ['24', '13', '11'],
 ['26', '14', '12'],
 ['33', '17', '16'],
 ['26', '12', '14'],
 ['17', '12', '5']]

import numpy as np
temp_array = np.column_stack((temp_list_column1,temp_list_column2))

输出

array([['St. Raphael', ['32', '16', '16']],
       ['Goppingen', ['32', '16', '16']],
       ['HSG Wetzlar', ['27', '13', '14']],
       ['Huttenberg', ['23', '9', '14']],
       ['Kiel', ['29', '14', '15']],
       ['Stuttgart', ['24', '17', '7']],
       ['Izvidac', ['30', '15', '15']],
       ['Viborg W', ['26', '12', '14']],
       ['Silkeborg-Voel W', ['27', '13', '14']],
       ['Bjerringbro W', ['26']],
       ['Lyngby W', ['18', '9', '9']],
       ['Most W', ['34', '15', '19']],
       ['Ostrava W', ['30', '13', '17']],
       ['Presov W', ['31', '13', '18']],
       ['Slavia Prague W', ['27', '10', '17']],
       ['Dicken', ['28', '14', '14']],
       ['Elbflorenz', ['24', '14', '10']],
       ['Lubeck-Schwartau', ['28', '12', '16']],
       ['HK Ogre/Miandum', ['28', '9', '19']],
       ['Stal Mielec', ['22', '13', '9']],
       ['MKS Perla Lublin W', ['30', '14', '16']],
       ['Koscierzyna W', ['22', '14', '8']],
       ['CS Madeira W', ['17', '8', '9']],
       ['CSM Focsani', ['26']],
       ['CSM Bucuresti', ['41', '21', '20']],
       ['Constanta', ['36', '18', '18']],
       ['Iasi', ['10']],
       ['Suceava', ['25', '12', '13']],
       ['Timisoara', ['27', '16', '11']],
       ['Saratov', ['31', '15', '16']],
       ['Alisa Ufa W', ['25', '15', '10']],
       ['Pozarevac', ['24', '8', '16']],
       ['Nove Zamky', ['28', '14', '14']],
       ['Aranas', ['24', '13', '11']],
       ['Ricoh', ['26', '14', '12']],
       ['H 65 Hoor W', ['33', '17', '16']],
       ['Lugi W', ['26', '12', '14']],
       ['Strands W', ['17', '12', '5']]], dtype=object)

示例2

temp_list_column1b=['Benidorm',
 'Alpla Hard',
 'Dubrava',
 'Frydek-Mistek',
 'Karvina',
 'Koprivnice',
 'Nove Veseli',
 'Vardar',
 'Meble Elblag Wojcik',
 'Zaglebie',
 'Benfica',
 'Barros W',
 'Juvelis W',
 'Assomada W',
 'UOR No.2 Moscow',
 'Izhevsk W',
 'Stavropol W',
 'Din. Volgograd W',
 'Zvenigorod W',
 'Adyif W',
 'Crvena zvezda',
 'Ribnica',
 'Slovan',
 'Jeruzalem Ormoz',
 'Karlskrona',
 'Torslanda W']

temp_list_column2b=[['28', '14', '14'],
 ['27', '12', '15'],
 ['24', '13', '11'],
 ['24', '14', '10'],
 ['28', '17', '11'],
 ['30', '16', '14'],
 ['26', '15', '11'],
 ['38', '18', '20'],
 ['24', '13', '11'],
 ['33', '15', '18'],
 ['24', '10', '14'],
 ['18', '11', '7'],
 ['22', '9', '13'],
 ['25', '12', '13'],
 ['19', '11', '8'],
 ['24', '10', '14'],
 ['21', '9', '12'],
 ['18', '10', '8'],
 ['31', '17', '14'],
 ['29', '15', '14'],
 ['26', '14', '12'],
 ['29', '12', '17'],
 ['25', '11', '14'],
 ['33', '19', '14'],
 ['32', '14', '18'],
 ['19', '12', '7']]



import numpy as np
temp_arrayb = np.column_stack((temp_list_column1b,temp_list_column2b))

输出

array([['Benidorm', '28', '14', '14'],
       ['Alpla Hard', '27', '12', '15'],
       ['Dubrava', '24', '13', '11'],
       ['Frydek-Mistek', '24', '14', '10'],
       ['Karvina', '28', '17', '11'],
       ['Koprivnice', '30', '16', '14'],
       ['Nove Veseli', '26', '15', '11'],
       ['Vardar', '38', '18', '20'],
       ['Meble Elblag Wojcik', '24', '13', '11'],
       ['Zaglebie', '33', '15', '18'],
       ['Benfica', '24', '10', '14'],
       ['Barros W', '18', '11', '7'],
       ['Juvelis W', '22', '9', '13'],
       ['Assomada W', '25', '12', '13'],
       ['UOR No.2 Moscow', '19', '11', '8'],
       ['Izhevsk W', '24', '10', '14'],
       ['Stavropol W', '21', '9', '12'],
       ['Din. Volgograd W', '18', '10', '8'],
       ['Zvenigorod W', '31', '17', '14'],
       ['Adyif W', '29', '15', '14'],
       ['Crvena zvezda', '26', '14', '12'],
       ['Ribnica', '29', '12', '17'],
       ['Slovan', '25', '11', '14'],
       ['Jeruzalem Ormoz', '33', '19', '14'],
       ['Karlskrona', '32', '14', '18'],
       ['Torslanda W', '19', '12', '7']], 
      dtype='<U19')

在第一种情况下,形状是(38,2),而第二种情况是(26,4)(我只对列数感兴趣)。我错过了一些明显的东西?

2 个答案:

答案 0 :(得分:3)

你的问题似乎是第一个B列表是锯齿状的,而第二个B列表是矩形的。

看看Numpy如何将以下两个列表转换为数组的区别(正如@hpaulj指出的那样,正是当你将它们传递给column_stack时会发生什么:

In [1]: b1 = [
   ...: [1,2,3],
   ...: [2,3,4],
   ...: [3,4,5],
   ...: [4,5,6]]

In [2]: np.array(b1)
Out[2]:
array([[1, 2, 3],
       [2, 3, 4],
       [3, 4, 5],
       [4, 5, 6]])

In [3]: b2 = [
   ...: [1,2,3],
   ...: [2,3],
   ...: [3]]

In [4]: np.array(b2)
Out[4]: array([list([1, 2, 3]), list([2, 3]), list([3])], dtype=object)

因此,当列堆叠您的示例列表时,在第一种情况下,您有一个列表的一维数组转换为单个列,而在第二种情况下,您有一个包含3列的二维数字矩阵。

在这种情况下,您甚至可能根本不使用Numpy的column_stack,只需将两个列表压缩在一起即可。如果您想要一个numpy数组作为最终结果,只需np.array(list(zip(list_a, list_b)))

编辑:回想起来,您的数据结构听起来更像是通常所说的DataFrame,而不是Numpy试图为您提供的矩阵。

import pandas as pd
data = pd.DataFrame()
data['name'] = temp_list_column1
data['numbers'] = test_list_column2

# Or
data = pd.DataFrame(list(zip(temp_list_column1, temp_list_column2)), columns=['name', 'numbers'])

它为您提供了一个如下所示的数据结构:

    name    numbers
0   John  [1, 2, 3]
1  James  [2, 3, 4]
2  Peter  [3, 4, 5]
3   Paul  [4, 5, 6]

答案 1 :(得分:1)

诊断

似乎问题是针对第二个示例,所有子列表都有3个元素,而在第一个示例中,存在长度为1的子列表,例如['Bjerringbro W',['26']];列表['26']只有一个元素。

在第二种情况下,显然np.column_stack强制将NOT列表作为单元格元素。实际上,我们可以再讨论一下为什么要将列表视为单元格元素,我将不会在此处进行讨论。这是解决方案

特殊案例解决方案

我假设您不介意使用pandas

import pandas as pd
series_1 = pd.Series(temp_list_column1b).to_frame(name='col1') # name it whatever you want
series_2 = pd.Series(temp_list_column2b).to_frame(name='col2') # name it whatever you want

df = pd.concat([series_1, series_2], axis=1)
# print(df) # view in pandas form
# print(df.values) # to see how it looks like as a numpy array
# print(df.values.shape) # to see how what the shape is in terms of numpy 

广义解决方案

假设您有一个名为“list_of_cols”的列的列表。然后:

import pandas as pd
'''
    list_of_cols: all the lists you want to combine
'''

df = pd.concat([pd.Series(temp_col).to_frame() for temp_col in list_of_cols], axis=1)

我希望这有帮助!