我有2维列表/矩阵,它是动态的N
行和M
列。
特定列中的数据类型是统一的,例如:col1是text,col2是integer,column3是float等。 列的顺序可以不同。某些行的值也可能丢失。
预期结果应为2个列表/数组/数据帧,其中:
list1
应重复N
次(取决于行数)col1_r1, col1_r2, ....., colM_row_n
,附加迭代器或行数list2
应该是行的转置值(包括空行)使用本机列表或/和numpy数组或/和panda数据帧在Python 3.6中实现此目的的最佳方法是什么?
output_list1 = [col1_1, col1_2, col1_3, col1_4, col1_5, col2_1, col2_2,
col2_3, col2_4, col2_5, col3_1, col3_2, col3_3, col3_4, col3_5]
-
output_list2 = ["value-row1,col1", "", "value-row3,col1", "value-row4,col1",
",value-row5,col1", "value-row1,col2", "value-row2,col2", "value-row3,col3",
0, "value-row5, col5", "value-row1, col3", 0.0, 0.0, 0.0, "value-row5,col4"]
提前感谢您的帮助。
答案 0 :(得分:3)
这应该可以解决问题:
import numpy as np
# create the data in a nested list
data_list = [['Event', 'Waits', 'Total Wait Time (sec)', 'Wait Avg(ms)', '% DB time', 'Wait Class'],
['latch free', '15,625', '311', '19.91', '29.6', 'Other'],
['library cache: mutex X', '90,012', '117,8', '1.31', '11.2', 'Concurrency'],
['DB CPU', '\xa0', '87,3', '\xa0', '8.3', '\xa0']]
# transform into numpy object array
data_array = np.array(data_list, dtype=object)
# construct header from first row
header = data_array[0, :]
# only use the data part of the array
data = data_array[1:, :]
list1 = []
list2 = []
for i in range(data.shape[0]):
for j in range(data.shape[1]):
# adjust for the 1 based index of row numbers
# transpose header columns by switching indices i and j
list1.append('{}_{}'.format(header[j], i+1))
# populate flattened data list
list2.append(data[i,j])
print(list1)
print(list2)
输出:
list1 = ['Event_1', 'Waits_1', 'Total Wait Time (sec)_1', 'Wait Avg(ms)_1', '% DB time_1', 'Wait Class_1', 'Event_2', 'Waits_2', 'Total Wait Time (sec)_2', 'Wait Avg(ms)_2', '% DB time_2', 'Wait Class_2', 'Event_3', 'Waits_3', 'Total Wait Time (sec)_3', 'Wait Avg(ms)_3', '% DB time_3', 'Wait Class_3']
list2 = ['latch free', '15,625', '311', '19.91', '29.6', 'Other', 'library cache: mutex X', '90,012', '117,8', '1.31', '11.2', 'Concurrency', 'DB CPU', '\xa0', '87,3', '\xa0', '8.3', '\xa0']