Question

我有两个具有以下结构的表，其中在表1中，ID在Name旁边，而在表2中，ID在Title 1旁边。两个表之间的一个相似之处是，第一个人总是有他们名字旁边的ID。对于后来的人来说，他们是不同的。

表1：

Name&Title   | ID # 
----------------------
Random_Name 1|2000 
Title_1_1    | - 
Title_1_2    | -
Random_Name 2| 2000
Title_2_1    | -
Title_2_2    | -
...          |...

表2：

Name&Title   | ID # 
----------------------
Random_Name 1| 2000 
Title_1_1    | -
Title_1_2    | -
Random_Name 2| -
Title_2_1    | 2000
Title_2_2    | -
...          |...

我有代码来识别表1但很难合并结构2.该表存储为行的嵌套列表（每行是一个列表）。通常，对于一个人，只有一行名称，但有多行标题。伪代码是这样的：

set count = 0
find the ID next to the first name, set it to be a recognizer
for row_i,row in enumerate(table):
   compare the ID of the next row until I found: row[1] == recognizer
   set count = row i
   slice the table to get the first person.

实际代码如下：

    header_ind = 0 # something related to the rest of the code
    recognizer = data[header_ind+1][1]
    count = header_ind+1
    result = []
    result.append(data[0]) #this append the headers
    for i, row in enumerate(data[header_ind+2:]):
        if i <= len(data[header_ind+4:]):
            if row[1] and data[i+1+header_ind+2][1] is recognizer:
                print data[i+header_ind+3]
                one_person = data[count:i+header_ind+3]
                result.append(one_person)
                count = i+header_ind+3
        else:
            if i == len(data[header_ind+3:]):
                last_person = data[count:i+header_ind+3]
                result.append(last_person)
                count = i+header_ind+3

我一直在考虑这个问题，所以我只想知道是否有可能得到一个算法来合并表2，因为我们无法区分行名和标题。

Answer 1

要坚持这个

所以这些是你的输入假设是你被限制在......：

# Table 1 
data1 = [['Name&Title','ID#'],
    ['Random_Name1','2000'],
    ['Title_1_1','-'],
    ['Title_1_2','-'],
    ['Random_Name2','2000'],
    ['Title_2_1','-'],
    ['Title_2_2','-']]

# TABLE 2
data2 = [['Name&Title','ID#'],
    ['Random_Name1','2000'],
    ['Title_1_1','-'],
    ['Title_1_2','-'],
    ['Random_Name2','-'],
    ['Title_2_1','2000'],
    ['Title_2_2','-']]

这是你想要的输出：

for x in data: 
    print x

['Random_Name2', '2000']
['Name&Title', 'ID#']
[['Random_Name1', '2000'], ['Title_1_1', '-'], ['Title_1_2', '-']]
[['Random_Name2', '2000'], ['Title_2_1', '-'], ['Title_2_2', '-']]

改进算法以区分不同类型的表

1 个答案: