在Python中

时间:2018-04-10 12:19:54

标签: python-3.x pandas numpy duplicates transpose

我有2维列表/矩阵,它是动态的N行和M列。

Sample Matrix

特定列中的数据类型是统一的,例如:col1是text,col2是integer,column3是float等。 列的顺序可以不同。某些行的值也可能丢失。

预期结果应为2个列表/数组/数据帧,其中:

  • list1应重复N次(取决于行数)col1_r1, col1_r2, ....., colM_row_n,附加迭代器或行数
  • list2应该是行的转置值(包括空行)

使用本机列表或/和numpy数组或/和panda数据帧在Python 3.6中实现此目的的最佳方法是什么?

output_list1 = [col1_1, col1_2, col1_3, col1_4, col1_5, col2_1, col2_2, 
                col2_3, col2_4, col2_5, col3_1, col3_2, col3_3, col3_4, col3_5]

-

output_list2 = ["value-row1,col1", "", "value-row3,col1",   "value-row4,col1",  
                ",value-row5,col1", "value-row1,col2", "value-row2,col2",   "value-row3,col3",  
                0,  "value-row5, col5", "value-row1, col3", 0.0, 0.0, 0.0, "value-row5,col4"]

提前感谢您的帮助。

1 个答案:

答案 0 :(得分:3)

这应该可以解决问题:

import numpy as np

# create the data in a nested list
data_list = [['Event', 'Waits', 'Total Wait Time (sec)', 'Wait Avg(ms)', '% DB time', 'Wait Class'], 
             ['latch free', '15,625', '311', '19.91', '29.6', 'Other'], 
             ['library cache: mutex X', '90,012', '117,8', '1.31', '11.2', 'Concurrency'], 
             ['DB CPU', '\xa0', '87,3', '\xa0', '8.3', '\xa0']]

# transform into numpy object array
data_array = np.array(data_list, dtype=object)
# construct header from first row
header = data_array[0, :]
# only use the data part of the array
data = data_array[1:, :]

list1 = []
list2 = []
for i in range(data.shape[0]):
    for j in range(data.shape[1]):
        # adjust for the 1 based index of row numbers
        # transpose header columns by switching indices i and j
        list1.append('{}_{}'.format(header[j], i+1))
        # populate flattened data list
        list2.append(data[i,j])

print(list1)
print(list2)

输出:

list1 = ['Event_1', 'Waits_1', 'Total Wait Time (sec)_1', 'Wait Avg(ms)_1', '% DB time_1', 'Wait Class_1', 'Event_2', 'Waits_2', 'Total Wait Time (sec)_2', 'Wait Avg(ms)_2', '% DB time_2', 'Wait Class_2', 'Event_3', 'Waits_3', 'Total Wait Time (sec)_3', 'Wait Avg(ms)_3', '% DB time_3', 'Wait Class_3']
list2 = ['latch free', '15,625', '311', '19.91', '29.6', 'Other', 'library cache: mutex X', '90,012', '117,8', '1.31', '11.2', 'Concurrency', 'DB CPU', '\xa0', '87,3', '\xa0', '8.3', '\xa0']