Question

我有一个.txt文件，其中包含来自多个.txt文件的数据。它从文件名开始，然后是标题（标题的数量不同）和数据。这里有例子。有人有任何想法吗？

重要提示：我需要留下＆＃34; - ＆＃34;当其中一个学生没有这个主题时。

input.txt中

Student 1.txt 
Maths,90
Science,50
English,62
Student 2.txt
Maths,75
Science,80
Chinese,88
Student 3.txt
Maths,83
Chinese,22
English,90
Physics,56

现在我有以下代码将其变成dict

open_input_file=open("input.txt","r")
datalines= open_input_file.readlines()
open_input_file.readlines()
open_input_file.close()
line=[]
value=0
from collections import defaultdict
d1=defaultdict(list)
for line in datalines:
        if line.find(',')>-1:
            key=line.split(",")[0]
            value=line.split(",")[1].strip("\n")
            d1[key].append(value)
        else:
            key="filename"
            value=line
            d1[key].append(value)

d=dict((key,tuple(value)) for key, value in d1.iteritems())
print d

我得到了这个

{'Chinese': ('88', '22'), 'Science': ('50', '80'), 'filename': ('Student 1.1\n', 'Student 2.1\n', 'Student 3.1\n'), 'English': ('62', '90'), 'Maths': ('90', '75', '83'), 'Physics': ('56',)}

但我真正想要的是这样的，所以标记对应于学生编号

filename,Student 1.txt, Student 2.txt, Student 3.txt    
Maths,90,75,83
Science,50,80,-
English,62,-,90
Chinese,-,88,22
Physics,-,-,-,56

Answer 1

这可能会有所改进，但它会保留大部分原始脚本并使其完整：

from collections import defaultdict
d1 = defaultdict(list)

open_input_file = open("input.txt", "r")
datalines = open_input_file.readlines()
open_input_file.readlines()
open_input_file.close()

# This part will gather all possible subjects in a set
subjects = set()
for line in datalines:
    if "," in line:
        subjects.add(line.split(",")[0])

# Now let's browse the data
student_subjects = set()
for line in datalines:
    if "," in line:  # new subject
        subject = line.split(",")[0]
        value = line.split(",")[1].strip("\n")
        d1[subject].append(value)
        student_subjects.add(subject)
    else:  # new student
        d1["filename"].append(line.strip("\n"))
        # But before starting to handle the new student's subjects, let's
        # complete the missing ones from previous student.
        if student_subjects:  # true if at least one student has been processed
            for subject in subjects - student_subjects:  # missing subjects
                d1[subject].append('-')
        student_subjects = set()

# Same thing when we meet the end of data (if there were missing subjects
# for the last student, like Science in this example data)
if student_subjects:
    for s in subjects - student_subjects:
        d1[s].append('-')

d = dict((key, tuple(value)) for key, value in d1.iteritems())

print d

# to view all this better:
print 'filenames: {}'.format(d['filename'])
for subject in d:
    if subject != 'filename':
        print '{}: {}'.format(subject, d[subject])

输出：

{'Chinese': ('-', '88', '22'), 'Science': ('50', '80', '-'), 'filename': ('Student 1.txt\n', 'Student 2.txt\n', 'Student 3.txt\n'), 'English': ('62', '-', '90'), 'Maths': ('90', '75', '83'), 'Physics': ('-', '-', '56')}
filenames: ('Student 1.txt', 'Student 2.txt', 'Student 3.txt')
Chinese: ('-', '88', '22')
Science: ('50', '80', '-')
English: ('62', '-', '90')
Maths: ('90', '75', '83')
Physics: ('-', '-', '56')

如何使用非特定标头

1 个答案: