Question

情况：AHLTA是一种电子病历，可以将GUI模板导出为文本。我正在构建模板编辑器，需要导入文本文件。每行代表一个GUI元素，并以一个标识GUI中父标签的数字开头。线条的顺序并不重要。我正在使用Python 3。

示例（ theFile ）：

1,550,57,730,77,0,32770," |||||||0|0||0|0|||0|||0|0|1|0|0|0|||","F=TimesNewRoman|C=8421504|T=T","Last updated: 2017-05-18"
0,743,4,823,48,0,16384," |||||||0|0||0|0|||0|||0|0|0|0|0|0|||","F=Arial|O=5|B=T","TSWF Navigator:<formLinkInfo><version>1.1</version><templateName>TSWF-Navigator</templateName><templateId>2238487</templateId><templateOwnerName>Department of Defense</templateOwnerName><templateOwnerNcid>33962</templateOwnerNcid></formLinkInfo>"
0,828,4,907,24,0,16384," |||||||0|0||0|0|||0|||0|0|0|0|0|0|||","O=5","CORE:<formLinkInfo><version>1.1</version><templateName>TSWF-CORE</templateName><templateId>1995726</templateId><templateOwnerName>Department of Defense</templateOwnerName><templateOwnerNcid>33962</templateOwnerNcid></formLinkInfo>"
2,25,791,370,811,297285,8961," | || ||||19|80|YCN|0|0|Y|N|0|||0|0|5|0|0|0|||","F=Arial|T=T","Responds to affection~ (by 4 months)"
2,25,871,370,891,297287,8961," | || ||||19|80|YCN|0|0|Y|N|0|||0|0|5|0|0|0|||","F=Arial|T=T","Indicates pleasure and displeasure~ (by 4 months)"

我的目标：我想要一个列表字典，其中键对应于GUI标签号，列表包含以该号开头的所有行。

示例：

0: 
0,743,4,823,48,0,16384," |||||||0|0||0|0|||0|||0|0|0|0|0|0|||","F=Arial|O=5|B=T","TSWF Navigator:<formLinkInfo><version>1.1</version><templateName>TSWF-Navigator</templateName><templateId>2238487</templateId><templateOwnerName>Department of Defense</templateOwnerName><templateOwnerNcid>33962</templateOwnerNcid></formLinkInfo>"
0,828,4,907,24,0,16384," |||||||0|0||0|0|||0|||0|0|0|0|0|0|||","O=5","CORE:<formLinkInfo><version>1.1</version><templateName>TSWF-CORE</templateName><templateId>1995726</templateId><templateOwnerName>Department of Defense</templateOwnerName><templateOwnerNcid>33962</templateOwnerNcid></formLinkInfo>"

1:
1,550,57,730,77,0,32770," |||||||0|0||0|0|||0|||0|0|1|0|0|0|||","F=TimesNewRoman|C=8421504|T=T","Last updated: 2017-05-18"

2:
2,25,791,370,811,297285,8961," | || ||||19|80|YCN|0|0|Y|N|0|||0|0|5|0|0|0|||","F=Arial|T=T","Responds to affection~ (by 4 months)"
2,25,871,370,891,297287,8961," | || ||||19|80|YCN|0|0|Y|N|0|||0|0|5|0|0|0|||","F=Arial|T=T","Indicates pleasure and displeasure~ (by 4 months)"

问题：我无法提前创建列表，因为在读取文件之前我不知道有多少个标签。我尝试循环遍历每个选项卡的文件，将该选项卡的项目收集到临时列表中，然后将列表添加到字典中，然后再转到下一个选项卡。为简单起见缩短了示例数据：

theFile = ['1,550,57,730,77', '0,743,4,823,48', '0,828,4,907,24', '2,25,791,370,811', '2,25,871,370,891']
tabCount = 3  # for this example; normally pulled from file header

sortedLines = dict()
for i in range(tabCount):
    tempList = []
    for line in theFile:
        tempList.append(line)
    sortedLines.update({tabCount: tempList})
    tempList.clear()

print('Dict: ', sortedLines)
for k, v in sortedLines.items():
    print('Pair: ' + str(k) + ': ' + '[%s]' % ', '.join(map(str, v)))

这似乎适当地循环，但我最终得到一个空对：

{3: []}
3: []

摘要：如果仅在运行时知道列表数，我该如何创建列表字典？

Answer 1

def main():
    # I'm assuming you can get this far...
    lines = [
        '1,some stuff 1',
        '2,some stuff 2,more stuff',
        '2,some stuff 4,candy,bacon',
        '3,some stuff 3,this,is,horrible...'
    ]

    # Something to hold your parsed data
    data = {}

    # Iterate over each line of your file
    for line in lines:

        # Split the data apart on comma per your example data
        parts = line.split(',')

        # denote the key is the first part of the split data
        key = parts[0]
        if key not in data:
            # Since there could be multiple values per key we need to keep a
            # list of mapped values
            data[key] = []

        # put the "other data" into the list
        index_of_sep = line.find(',')
        data[key].append(line[index_of_sep+1:])

    # You probably want to return here. I'm printing so you can see the result
    print(data)


if __name__ == '__main__':
    main()

<强>结果

C:\Python35\python.exe C:/Users/Frito/GitSource/sandbox/sample.py
{'3': ['some stuff 3,this,is,horrible...'], '1': ['some stuff 1'], '2': ['some stuff 2,more stuff', 'some stuff 4,candy,bacon']}

Process finished with exit code 0

将字符串排序到字典中，其中initial char是键，值是以该char开头的所有行的列表

1 个答案: