改进算法以区分不同类型的表

时间:2015-05-28 23:10:23

标签: python algorithm data-structures

我有两个具有以下结构的表,其中在表1中,ID在Name旁边,而在表2中,ID在Title 1旁边。两个表之间的一个相似之处是,第一个人总是有他们名字旁边的ID。对于后来的人来说,他们是不同的。

表1:

Name&Title   | ID # 
----------------------
Random_Name 1|2000 
Title_1_1    | - 
Title_1_2    | -
Random_Name 2| 2000
Title_2_1    | -
Title_2_2    | -
...          |...

表2:

Name&Title   | ID # 
----------------------
Random_Name 1| 2000 
Title_1_1    | -
Title_1_2    | -
Random_Name 2| -
Title_2_1    | 2000
Title_2_2    | -
...          |...

我有代码来识别表1但很难合并结构2.该表存储为行的嵌套列表(每行是一个列表)。通常,对于一个人,只有一行名称,但有多行标题。伪代码是这样的:

set count = 0
find the ID next to the first name, set it to be a recognizer
for row_i,row in enumerate(table):
   compare the ID of the next row until I found: row[1] == recognizer
   set count = row i
   slice the table to get the first person. 

实际代码如下:

    header_ind = 0 # something related to the rest of the code
    recognizer = data[header_ind+1][1]
    count = header_ind+1
    result = []
    result.append(data[0]) #this append the headers
    for i, row in enumerate(data[header_ind+2:]):
        if i <= len(data[header_ind+4:]):
            if row[1] and data[i+1+header_ind+2][1] is recognizer:
                print data[i+header_ind+3]
                one_person = data[count:i+header_ind+3]
                result.append(one_person)
                count = i+header_ind+3
        else:
            if i == len(data[header_ind+3:]):
                last_person = data[count:i+header_ind+3]
                result.append(last_person)
                count = i+header_ind+3

我一直在考虑这个问题,所以我只想知道是否有可能得到一个算法来合并表2,因为我们无法区分行名和标题。

1 个答案:

答案 0 :(得分:0)

要坚持这个

所以这些是你的输入假设是你被限制在......:

# Table 1 
data1 = [['Name&Title','ID#'],
    ['Random_Name1','2000'],
    ['Title_1_1','-'],
    ['Title_1_2','-'],
    ['Random_Name2','2000'],
    ['Title_2_1','-'],
    ['Title_2_2','-']]

# TABLE 2
data2 = [['Name&Title','ID#'],
    ['Random_Name1','2000'],
    ['Title_1_1','-'],
    ['Title_1_2','-'],
    ['Random_Name2','-'],
    ['Title_2_1','2000'],
    ['Title_2_2','-']]

这是你想要的输出:

for x in data: 
    print x

['Random_Name2', '2000']
['Name&Title', 'ID#']
[['Random_Name1', '2000'], ['Title_1_1', '-'], ['Title_1_2', '-']]
[['Random_Name2', '2000'], ['Title_2_1', '-'], ['Title_2_2', '-']]