插入csv数据的父ID

时间:2012-10-04 20:30:42

标签: python

我有一个这种格式的csv文件

Country State   City    County
X       A       
X       A       R   
X       A       R       X
X       A       R       Y
X       B       
X       B       S   
X       B       S       X

它代表树(包含)关系。 现在我需要插入id和父ID,它们反映了这种关系。 例如: Y的父(id = 5)是R,其具有id 3.因此Y的父字段是3。

id  parent  Country State   City    County
1   0       X    
2   1       X       A       
3   2       X       A       R   
4   3       X       A       R       X
5   3       X       A       R       Y
6   1       X       B       
7   6       X       B       S   
8   7       X       B       S       X

由于有数千个条目,手动操作很繁琐。 我怎么能用Python做到这一点。那就是读取文件(第一个块)并输出id和父插入(上面的第二个代码块)

2 个答案:

答案 0 :(得分:1)

这不具吸引力(而不是Python,如果这样做不是一种选择,请道歉),但如果你想避免编写脚本,你可以使用它(假设屏幕截图中的设置):

=INDEX(
       $A$1:$A$9,
      MATCH(
            INDIRECT(ADDRESS(ROW(),COUNTA(C2:F2)+1)),
            INDIRECT(
                  SUBSTITUTE(ADDRESS(1,COUNTA(C2:F2)+1,4) & ":" & ADDRESS(1,COUNTA(C2:F2)+1,4),"1","")),
             0),
        1)

这假定数据的顺序是这样的,即在引用之前定义父id。要填充ID,您可以使用Fill Series创建递增列表。同样,这不是很好(并且可能不适合你需要的东西),但这是你可以避免编写脚本的一种方法(如果你需要Python,JoranBeasley建议使用CSV模块是可行的方法)。

enter image description here

答案 1 :(得分:1)

编辑:此解决方案应该更清晰。这是对先前解决方案(12)的修改,而不是新方法。单循环而不复制使这个更容易理解。

导入副本     导入csv     import StringIO

csv_str = """X,,,
X,A,,
X,A,R,
X,A,R,X
X,A,R,Y
X,B,,
X,B,S,
X,B,S,X
"""

reader = csv.reader(StringIO.StringIO(csv_str))

idx = 0
data = []

for row in reader:
    # insert the row id
    row.insert(0, idx + 1)

    # insert a dummy parent id, it will be replaced with the real
    # value later
    row.insert(1, -1)

    # how deep is the current row
    depth = len([r for r in row if r is not ''])
    # insert the depth as the last value in the row
    row.append(depth)

    if idx > 0:
        # if it's not the first row, calculate it's parent

        # calculate the depth of the previous row
        prev_depth = data[idx - 1][-1]
        if depth > prev_depth:
            # if it's deeper than the previous row, then the previous
            # row is the parent row
            row[1] = data[idx - 1][0]
        elif depth == prev_depth:
            # if it's the same depth as the previous row then it has
            # the same parent as the previous row
            row[1] = data[idx - 1][3]
        else:
            # if it's shallower than the previos row, find the
            # nearest previous row with the same depth and use it's
            # parent as this row's parent.
            ridx = idx - 1
            while (prev_depth != depth and ridx >= 0):
                prev_depth = data[ridx - 1][-1]
                ridx -= 1
            row[1] = data[ridx - 1][0]
    else:
        # if it's the first row it's parent is 0
        row[1] = 0

    # store the new row
    data.append(row)
    idx += 1


# write the CSV
output = StringIO.StringIO()
writer = csv.writer(output)
for row in data:
    # skip the depth value in each row
    writer.writerow(row[:-1])

print output.getvalue()

您可以在此处查看代码:http://codepad.org/DvGtOw8G