我有一个这种格式的csv文件
Country State City County
X A
X A R
X A R X
X A R Y
X B
X B S
X B S X
它代表树(包含)关系。 现在我需要插入id和父ID,它们反映了这种关系。 例如: Y的父(id = 5)是R,其具有id 3.因此Y的父字段是3。
id parent Country State City County
1 0 X
2 1 X A
3 2 X A R
4 3 X A R X
5 3 X A R Y
6 1 X B
7 6 X B S
8 7 X B S X
由于有数千个条目,手动操作很繁琐。 我怎么能用Python做到这一点。那就是读取文件(第一个块)并输出id和父插入(上面的第二个代码块)
答案 0 :(得分:1)
这不具吸引力(而不是Python,如果这样做不是一种选择,请道歉),但如果你想避免编写脚本,你可以使用它(假设屏幕截图中的设置):
=INDEX(
$A$1:$A$9,
MATCH(
INDIRECT(ADDRESS(ROW(),COUNTA(C2:F2)+1)),
INDIRECT(
SUBSTITUTE(ADDRESS(1,COUNTA(C2:F2)+1,4) & ":" & ADDRESS(1,COUNTA(C2:F2)+1,4),"1","")),
0),
1)
这假定数据的顺序是这样的,即在引用之前定义父id。要填充ID
,您可以使用Fill Series
创建递增列表。同样,这不是很好(并且可能不适合你需要的东西),但这是你可以避免编写脚本的一种方法(如果你需要Python,JoranBeasley建议使用CSV模块是可行的方法)。
答案 1 :(得分:1)
编辑:此解决方案应该更清晰。这是对先前解决方案(1,2)的修改,而不是新方法。单循环而不复制使这个更容易理解。
导入副本 导入csv import StringIO
csv_str = """X,,,
X,A,,
X,A,R,
X,A,R,X
X,A,R,Y
X,B,,
X,B,S,
X,B,S,X
"""
reader = csv.reader(StringIO.StringIO(csv_str))
idx = 0
data = []
for row in reader:
# insert the row id
row.insert(0, idx + 1)
# insert a dummy parent id, it will be replaced with the real
# value later
row.insert(1, -1)
# how deep is the current row
depth = len([r for r in row if r is not ''])
# insert the depth as the last value in the row
row.append(depth)
if idx > 0:
# if it's not the first row, calculate it's parent
# calculate the depth of the previous row
prev_depth = data[idx - 1][-1]
if depth > prev_depth:
# if it's deeper than the previous row, then the previous
# row is the parent row
row[1] = data[idx - 1][0]
elif depth == prev_depth:
# if it's the same depth as the previous row then it has
# the same parent as the previous row
row[1] = data[idx - 1][3]
else:
# if it's shallower than the previos row, find the
# nearest previous row with the same depth and use it's
# parent as this row's parent.
ridx = idx - 1
while (prev_depth != depth and ridx >= 0):
prev_depth = data[ridx - 1][-1]
ridx -= 1
row[1] = data[ridx - 1][0]
else:
# if it's the first row it's parent is 0
row[1] = 0
# store the new row
data.append(row)
idx += 1
# write the CSV
output = StringIO.StringIO()
writer = csv.writer(output)
for row in data:
# skip the depth value in each row
writer.writerow(row[:-1])
print output.getvalue()
您可以在此处查看代码:http://codepad.org/DvGtOw8G