我最近做了一些网络抓取,最终取出了我想要的数据。但是,没有组织意识,因为它只是Python中的一个简单列表。它包含会议类型(LE / DI / SE),日期,时间,教授的姓名(为隐私编辑),以及其他一些值。
["LE", "A00", "MWF", "10:00a-10:50a", "MLK", "AUD", "Smith, John", "976539", "DI", "A01", "F", "5:00p-5:50p", "MLK", "AUD", "Smith, John", "FULL Waitlist(25)", "216", "FI", "03/17/2018", "S", "8:00a-10:59a", "TBA", "TBA", "LE", "B00", "MWF", "1:00p-1:50p", "WLH", "2005", "Smith, John", "927471", "DI", "B01", "F", "6:00p-6:50p", "MLK", "AUD", "Smith, John", "FULL Waitlist(32)", "200", "FI", "03/17/2018", "S", "8:00a-10:59a", "TBA", "TBA"]
正如您所看到的,这是一个丑陋的清单。我的目标是做到这样:
{
"MATH101": {
"LE": {
sectionCode: 'A00',
days: 'MFW',
times: '10:00a-10:50am',
building: 'MLK',
room: 'AUD',
instructor: 'Smith, John',
"DI": {
sectionCode: 'A01',
days: 'F',
times: '5:00-5:50pm',
building: 'MLK',
room: 'AUD',
instructor: 'Smith, John',
availableSeats: 'FULL Waitlist(25)',
capacity: '216'
}
},
"LE": {
sectionCode: 'B00',
days: 'MFW',
times: '1:00a-1:50pm',
building: 'MLK',
room: 'AUD',
instructor: 'Smith, John',
"DI": {
sectionCode: 'B01',
days: 'F',
times: '6:00-6:50pm',
building: 'MLK',
room: 'AUD',
instructor: 'Smith, John',
availableSeats: 'FULL Waitlist(32)',
capacity: '200'
}
}
}
}
每个讲座部分都有一个讨论,因此每个讲座有两个讲座时间和两个讨论。我不知道是否有更好的架构来存储这些数据,但我只能想出这个。
我想迭代整个列表并在" LE"之后开始保存值。或" DI"我在上面建立了有序的方式,但是我不知道如何在没有每个密钥的情况下创建json文件。
我对这一切都很陌生,我还没有找到解决方案。尝试转换成字典,但它不能满足我的需求。许多数据也被重复,例如教授的名字,但不一定总是一位教授,也不是两次讲座/讨论。我打算用所有可用的课程来做这个,所以这变得越来越复杂......
希望有人能够提供帮助,谢谢!