我需要一些帮助,希望你能帮助我。
我使用机械化来从网站上获取一些数据。这已被处理为文件中的某些输出。这个文件我想再处理一些,但在这里我遇到了一些问题。
数据如下所示:
eek43"><a name="week43">Week 43</a></h2>
<div class="day"><h3 class="dayname">Monday</h3><div class="date">24/10/2016</div><div class="event" style="background-color: #58AA40"><a href="/course/view.php?id=16544">[E16] 1. sem / M1 - Psykiatri/psykologi</a><div class="teacher">Jane Doe</div><div class="time">Time: 08:15 - 12:00</div><div class="location">Location: KS5 lok. 47/49. GrpR:58,74,75,76,77,78,79,81,83</div><div class="note">Note: some notes</div></div><div class="event" style="background-color: #ACFA58"><a href="/course/view.php?id=16533">[E16] 1. sem / M1 - Jura</a><div class="teacher">Jannie Doe</div><div class="time">Time: 08:15 - 10:00</div><div class="location">Location: NJV 6A 1.50</div><div class="note">Note: Hold X2 some notes</div></div><div class="event" style="background-color: #ACFA58"><a href="/course/view.php?id=16533">[E16] 1. sem / M1 - Jura</a><div class="teacher">Jane Doe</div><div class="time">Time: 10:15 - 12:00</div><div class="location">Location: NJV 6A 1.50</div><div class="note">Note: Hold X2 - opsamling</div></div><div class="event" style="background-color: #58AA40"><a href="/course/view.php?id=16544">[E16] 1. sem / M1 - Psykiatri/psykologi</a><div class="teacher">Jannie Doe</div><div class="time">Time: 12:30 - 16:15</div><div class="location">Location: KS5 lok. 47/49.GrpR:58,74,75,76,77,78,79,81,83</div><div class="note">Note: some notes</div></div></div>
<div class="day"><h3 class="dayname">Tuesday</h3><div class="date">25/10/2016</div><div class="event" style="background-color: #5858FA"><a href="/course/view.php?id=16538">[E16] 1. sem / M1 - Socialt arbejde</a><div class="teacher">John Doe</div><div class="time">Time: 08:15 - 10:00</div><div class="location">Location: Fib 15. aud. B</div><div class="note">Note: Hold X&Y - Opsamling af profession og socialrådgiv</div></div><div class="event" style="background-color: #58AA40"><a href="/course/view.php?id=16544">[E16] 1. sem / M1 - Psykiatri/psykologi</a><div class="teacher">Jannie Doe</div><div class="time">Time: 10:15 - 14:15</div><div class="location">Location: NJV 8A, lok. 1.12 AUD</div><div class="note">Note: Hold X&Y - Perspektiver på psykiske lidelser...</div></div></div>
<div class="day"><h3 class="dayname">Wednesday</h3><div class="date">26/10/2016</div><div class="event" style="background-color: #ACFA58"><a href="/course/view.php?id=16533">[E16] 1. sem / M1 - Jura</a><div class="teacher">James Doe</div><div class="time">Time: 08:15 - 10:00</div><div class="location">Location: NJV 6A 1.50A</div><div class="note">Note: Hold Y1 - opsamling</div></div><div class="event" style="background-color: #ACFA58"><a href="/course/view.php?id=16533">[E16] 1. sem / M1 - Jura</a><div class="teacher">James Doe</div><div class="time">Time: 10:15 - 12:00</div><div class="location">Location: NJV 6A 1.50A</div><div class="note">Note: Hold Y2 - opsamling</div></div></div>
<div class="day"><h3 class="dayname">Thursday</h3><div class="date">27/10/2016</div></div>
<div class="day"><h3 class="dayname">Friday</h3><div class="date">28/10/2016</div><div class="event" style="background-color: #ACFA58"><a href="/course/view.php?id=16533">[E16] 1. sem / M1 - Jura</a><div class="teacher">Johnny Doe</div><div class="time">Time: 08:15 - 10:00</div><div class="location">Location: Fib 13.053</div><div class="note">Note: Hold Y1a - øvelser - opsamling</div></div><div class="event" style="background-color: #ACFA58"><a href="/course/view.php?id=16533">[E16] 1. sem / M1 - Jura</a><div class="teacher">Lisa Andersen</div><div class="time">Time: 08:15 - 10:00</div><div class="location">Location: Fib 13.047</div><div class="note">Note: Hold X1a - øvelser - opsamling</div></div><div class="event" style="background-color: #ACFA58"><a href="/course/view.php?id=16533">[E16] 1. sem / M1 - Jura</a><div class="teacher">John Doe</div><div class="time">Time: 08:15 - 10:00</div><div class="location">Location: Fib 13.049</div><div class="note">Note: Hold X2a - øvelser - opsamling</div></div><div class="event" style="background-color: #ACFA58"><a href="/course/view.php?id=16533">[E16] 1. sem / M1 - Jura</a><div class="teacher">Janine Doe</div><div class="time">Time: 08:15 - 10:00</div><div class="location">Location: Fib 13.055</div><div class="note">Note: Hold Y2a - øvelser - opsamling</div></div><div class="event" style="background-color: #ACFA58"><a href="/course/view.php?id=16533">[E16] 1. sem / M1 - Jura</a><div class="teacher">Jamie Doe</div><div class="time">Time: 10:15 - 12:00</div><div class="location">Location: Fib 13.047</div><div class="note">Note: Hold X1b - øvelser - opsamling</div></div><div class="event" style="background-color: #ACFA58"><a href="/course/view.php?id=16533">[E16] 1. sem / M1 - Jura</a><div class="teacher">James Doe</div><div class="time">Time: 10:15 - 12:00</div><div class="location">Location: Fib 13.055</div><div class="note">Note: Hold Y2b - øvelser - opsamling</div></div><div class="event" style="background-color: #ACFA58"><a href="/course/view.php?id=16533">[E16] 1. sem / M1 - Jura</a><div class="teacher">Johnny Doe</div><div class="time">Time: 10:15 - 12:00</div><div class="location">Location: Fib 13.053</div><div class="note">Note: Hold Y1b - øvelser - opsamling</div></div><div class="event" style="background-color: #ACFA58"><a href="/course/view.php?id=16533">[E16] 1. sem / M1 - Jura</a><div class="teacher">John Doe</div><div class="time">Time: 10:15 - 12:00</div><div class="location">Location: Fib 13.049</div><div class="note">Note: Hold X2b - øvelser - opsamling</div></div></div>
<div class="day"><h3 class="dayname">Saturday</h3><div class="date">29/10/2016</div></div>
<div class="day"><h3 class="dayname">Sunday</h3><div class="date">30/10/2016</div></div>
<h2 class="week" id="
最终我想做一个像这样的输出(所有约会都有一个“注释”包含X2或X2a(不是例如Y1)):
Monday 24/10/2016
[E16] 1. sem / M1 - Psykiatri/psykologi Jane Doe Time: 08:15 - 12:00 Location: KS5 lok. 47/49. Note: Hold X2 some notes
[E16] 1. sem / M1 - Jura Jannie Doe Time: 08:15 - 10:00 Location: NJV 6A 1.50 Note: Hold X2a some notes
[E16] 1. sem / M1 - Jura Jane Do Time: 10:15 - 12:00 Location: NJV 6A 1.50 Note: Hold X2 - opsamling
...
Tuesday 25/10/2016
...
但是,如果我运行我的代码,我只收到第一行:
[(u'Monday', u'24/10/2016', u'Jane Doe', u'Time: 08:15 - 12:00', u'Note: Hold X2 some notes'), (u'Monday', u'24/10/2016', u'Jane Doe', u'Time: 08:15 - 12:00', u'Note: Hold X2 some notes'), (u'Monday', u'24/10/2016', u'Jane Doe', u'Time: 08:15 - 12:00', u'Note: Hold X2 some notes'),...
部分代码:
data = []
soup = BeautifulSoup(open('scrape_out.txt'))
for lines in soup :
date = soup.find('div', attrs={'class': 'date'}).text.strip()
day = soup.find('h3', attrs={'class': 'dayname'}).text.strip()
teacher = soup.find('div', attrs={'class': 'teacher'}).text.strip()
#lecture = soup.find('div', attrs={'a': })
time = soup.find('div', attrs={'class': 'time'}).text.strip()
location = soup.find('div', attrs={'class': 'location'}).text.strip()
note = soup.find('div', attrs={'class': 'note'}).text.strip()
data.append((day, date, teacher, time, note))
print data
我尝试了很多不同的循环/迭代等,但我只得到这个输出(同一行继续反复):
任何能够指出我正确方向的人(我在哪里开始:))
提前谢谢。
答案 0 :(得分:1)
你需要迭代几天:
.as-console-wrapper { max-height: 100% !important; top: 0; }
哪个会给你:
h = """<div><h2 class="week43"><a name="week43">Week 43</a></h2>
<div class="day"><h3 class="dayname">Monday</h3><div class="date">24/10/2016</div><div class="event" style="background-color: #58AA40"><a href="/course/view.php?id=16544">[E16] 1. sem / M1 - Psykiatri/psykologi</a><div class="teacher">Jane Doe</div><div class="time">Time: 08:15 - 12:00</div><div class="location">Location: KS5 lok. 47/49. GrpR:58,74,75,76,77,78,79,81,83</div><div class="note">Note: some notes</div></div><div class="event" style="background-color: #ACFA58"><a href="/course/view.php?id=16533">[E16] 1. sem / M1 - Jura</a><div class="teacher">Jannie Doe</div><div class="time">Time: 08:15 - 10:00</div><div class="location">Location: NJV 6A 1.50</div><div class="note">Note: Hold X2 some notes</div></div><div class="event" style="background-color: #ACFA58"><a href="/course/view.php?id=16533">[E16] 1. sem / M1 - Jura</a><div class="teacher">Jane Doe</div><div class="time">Time: 10:15 - 12:00</div><div class="location">Location: NJV 6A 1.50</div><div class="note">Note: Hold X2 - opsamling</div></div><div class="event" style="background-color: #58AA40"><a href="/course/view.php?id=16544">[E16] 1. sem / M1 - Psykiatri/psykologi</a><div class="teacher">Jannie Doe</div><div class="time">Time: 12:30 - 16:15</div><div class="location">Location: KS5 lok. 47/49.GrpR:58,74,75,76,77,78,79,81,83</div><div class="note">Note: some notes</div></div></div>
<div class="day"><h3 class="dayname">Tuesday</h3><div class="date">25/10/2016</div><div class="event" style="background-color: #5858FA"><a href="/course/view.php?id=16538">[E16] 1. sem / M1 - Socialt arbejde</a><div class="teacher">John Doe</div><div class="time">Time: 08:15 - 10:00</div><div class="location">Location: Fib 15. aud. B</div><div class="note">Note: Hold X&Y - Opsamling af profession og socialrådgiv</div></div><div class="event" style="background-color: #58AA40"><a href="/course/view.php?id=16544">[E16] 1. sem / M1 - Psykiatri/psykologi</a><div class="teacher">Jannie Doe</div><div class="time">Time: 10:15 - 14:15</div><div class="location">Location: NJV 8A, lok. 1.12 AUD</div><div class="note">Note: Hold X&Y - Perspektiver på psykiske lidelser...</div></div></div>
<div class="day"><h3 class="dayname">Wednesday</h3><div class="date">26/10/2016</div><div class="event" style="background-color: #ACFA58"><a href="/course/view.php?id=16533">[E16] 1. sem / M1 - Jura</a><div class="teacher">James Doe</div><div class="time">Time: 08:15 - 10:00</div><div class="location">Location: NJV 6A 1.50A</div><div class="note">Note: Hold Y1 - opsamling</div></div><div class="event" style="background-color: #ACFA58"><a href="/course/view.php?id=16533">[E16] 1. sem / M1 - Jura</a><div class="teacher">James Doe</div><div class="time">Time: 10:15 - 12:00</div><div class="location">Location: NJV 6A 1.50A</div><div class="note">Note: Hold Y2 - opsamling</div></div></div>
<div class="day"><h3 class="dayname">Thursday</h3><div class="date">27/10/2016</div></div>
<div class="day"><h3 class="dayname">Friday</h3><div class="date">28/10/2016</div><div class="event" style="background-color: #ACFA58"><a href="/course/view.php?id=16533">[E16] 1. sem / M1 - Jura</a><div class="teacher">Johnny Doe</div><div class="time">Time: 08:15 - 10:00</div><div class="location">Location: Fib 13.053</div><div class="note">Note: Hold Y1a - øvelser - opsamling</div></div><div class="event" style="background-color: #ACFA58"><a href="/course/view.php?id=16533">[E16] 1. sem / M1 - Jura</a><div class="teacher">Lisa Andersen</div><div class="time">Time: 08:15 - 10:00</div><div class="location">Location: Fib 13.047</div><div class="note">Note: Hold X1a - øvelser - opsamling</div></div><div class="event" style="background-color: #ACFA58"><a href="/course/view.php?id=16533">[E16] 1. sem / M1 - Jura</a><div class="teacher">John Doe</div><div class="time">Time: 08:15 - 10:00</div><div class="location">Location: Fib 13.049</div><div class="note">Note: Hold X2a - øvelser - opsamling</div></div><div class="event" style="background-color: #ACFA58"><a href="/course/view.php?id=16533">[E16] 1. sem / M1 - Jura</a><div class="teacher">Janine Doe</div><div class="time">Time: 08:15 - 10:00</div><div class="location">Location: Fib 13.055</div><div class="note">Note: Hold Y2a - øvelser - opsamling</div></div><div class="event" style="background-color: #ACFA58"><a href="/course/view.php?id=16533">[E16] 1. sem / M1 - Jura</a><div class="teacher">Jamie Doe</div><div class="time">Time: 10:15 - 12:00</div><div class="location">Location: Fib 13.047</div><div class="note">Note: Hold X1b - øvelser - opsamling</div></div><div class="event" style="background-color: #ACFA58"><a href="/course/view.php?id=16533">[E16] 1. sem / M1 - Jura</a><div class="teacher">James Doe</div><div class="time">Time: 10:15 - 12:00</div><div class="location">Location: Fib 13.055</div><div class="note">Note: Hold Y2b - øvelser - opsamling</div></div><div class="event" style="background-color: #ACFA58"><a href="/course/view.php?id=16533">[E16] 1. sem / M1 - Jura</a><div class="teacher">Johnny Doe</div><div class="time">Time: 10:15 - 12:00</div><div class="location">Location: Fib 13.053</div><div class="note">Note: Hold Y1b - øvelser - opsamling</div></div><div class="event" style="background-color: #ACFA58"><a href="/course/view.php?id=16533">[E16] 1. sem / M1 - Jura</a><div class="teacher">John Doe</div><div class="time">Time: 10:15 - 12:00</div><div class="location">Location: Fib 13.049</div><div class="note">Note: Hold X2b - øvelser - opsamling</div></div></div>
<div class="day"><h3 class="dayname">Saturday</h3><div class="date">29/10/2016</div></div>
<div class="day"><h3 class="dayname">Sunday</h3><div class="date">30/10/2016</div></div>
</div>
"""
from bs4 import BeautifulSoup
import re
soup = BeautifulSoup(h, "lxml")
for d in soup.find_all("div", class_="day"):
notes = d.find_all("div", class_="note")
teachers = d.find_all("div",class_="teacher")
date = d.find("div", class_="date")
times = d.find_all("div", class_="time")
day = d.find("h3",class_="dayname")
for note,time, teacher in zip(notes,times, teachers):
note_text = note.text
if "X2" in note_text:
print((day.text, date.text, teacher.text,time.text, note.text))
如果您想要按周分组,则需要添加一个find_all调用,以查找包含所有周的父元素。
要写入文件,您可以使用csv lib:
('Monday', '24/10/2016', 'Jannie Doe', 'Time: 08:15 - 10:00', 'Note: Hold X2 some notes')
('Monday', '24/10/2016', 'Jane Doe', 'Time: 10:15 - 12:00', 'Note: Hold X2 - opsamling')
('Friday', '28/10/2016', 'John Doe', 'Time: 08:15 - 10:00', 'Note: Hold X2a - øvelser - opsamling')
('Friday', '28/10/2016', 'John Doe', 'Time: 10:15 - 12:00', 'Note: Hold X2b - øvelser - opsamling')