我有一个.txt文件,其中包含来自多个.txt文件的数据。它从文件名开始,然后是标题(标题的数量不同)和数据。这里有例子。有人有任何想法吗?
重要提示:我需要留下" - "当其中一个学生没有这个主题时。
input.txt中
Student 1.txt
Maths,90
Science,50
English,62
Student 2.txt
Maths,75
Science,80
Chinese,88
Student 3.txt
Maths,83
Chinese,22
English,90
Physics,56
现在我有以下代码将其变成dict
open_input_file=open("input.txt","r")
datalines= open_input_file.readlines()
open_input_file.readlines()
open_input_file.close()
line=[]
value=0
from collections import defaultdict
d1=defaultdict(list)
for line in datalines:
if line.find(',')>-1:
key=line.split(",")[0]
value=line.split(",")[1].strip("\n")
d1[key].append(value)
else:
key="filename"
value=line
d1[key].append(value)
d=dict((key,tuple(value)) for key, value in d1.iteritems())
print d
我得到了这个
{'Chinese': ('88', '22'), 'Science': ('50', '80'), 'filename': ('Student 1.1\n', 'Student 2.1\n', 'Student 3.1\n'), 'English': ('62', '90'), 'Maths': ('90', '75', '83'), 'Physics': ('56',)}
但我真正想要的是这样的,所以标记对应于学生编号
filename,Student 1.txt, Student 2.txt, Student 3.txt
Maths,90,75,83
Science,50,80,-
English,62,-,90
Chinese,-,88,22
Physics,-,-,-,56
答案 0 :(得分:1)
这可能会有所改进,但它会保留大部分原始脚本并使其完整:
from collections import defaultdict
d1 = defaultdict(list)
open_input_file = open("input.txt", "r")
datalines = open_input_file.readlines()
open_input_file.readlines()
open_input_file.close()
# This part will gather all possible subjects in a set
subjects = set()
for line in datalines:
if "," in line:
subjects.add(line.split(",")[0])
# Now let's browse the data
student_subjects = set()
for line in datalines:
if "," in line: # new subject
subject = line.split(",")[0]
value = line.split(",")[1].strip("\n")
d1[subject].append(value)
student_subjects.add(subject)
else: # new student
d1["filename"].append(line.strip("\n"))
# But before starting to handle the new student's subjects, let's
# complete the missing ones from previous student.
if student_subjects: # true if at least one student has been processed
for subject in subjects - student_subjects: # missing subjects
d1[subject].append('-')
student_subjects = set()
# Same thing when we meet the end of data (if there were missing subjects
# for the last student, like Science in this example data)
if student_subjects:
for s in subjects - student_subjects:
d1[s].append('-')
d = dict((key, tuple(value)) for key, value in d1.iteritems())
print d
# to view all this better:
print 'filenames: {}'.format(d['filename'])
for subject in d:
if subject != 'filename':
print '{}: {}'.format(subject, d[subject])
输出:
{'Chinese': ('-', '88', '22'), 'Science': ('50', '80', '-'), 'filename': ('Student 1.txt\n', 'Student 2.txt\n', 'Student 3.txt\n'), 'English': ('62', '-', '90'), 'Maths': ('90', '75', '83'), 'Physics': ('-', '-', '56')}
filenames: ('Student 1.txt', 'Student 2.txt', 'Student 3.txt')
Chinese: ('-', '88', '22')
Science: ('50', '80', '-')
English: ('62', '-', '90')
Maths: ('90', '75', '83')
Physics: ('-', '-', '56')