我正在使用Python calendar
包创建HTML日历,但由于Python calendar
仅显示文本日期,因此希望每个日期都为一个链接。我正在使用BeautifulSoup4
查找所有元素并将其替换为链接。但是,当我这样做时,它会将我的大于和小于符号更改为>
和<
。我什至尝试使用unescape
python包中的html
强制使用它。它做同样的事情。
cal = calendar.HTMLCalendar(calendar.SUNDAY)
soup = BeautifulSoup(cal.formatmonth(2019, 11))
创建:
<html>
<body>
<table border="0" cellpadding="0" cellspacing="0" class="month">
<tr>
<th class="month" colspan="7">
November 2019
</th>
</tr>
<tr>
<th class="sun">
Sun
</th>
<th class="mon">
Mon
</th>
<th class="tue">
Tue
</th>
<th class="wed">
Wed
</th>
<th class="thu">
Thu
</th>
<th class="fri">
Fri
</th>
<th class="sat">
Sat
</th>
</tr>
<tr>
<td class="noday">
</td>
<td class="noday">
</td>
<td class="noday">
</td>
<td class="noday">
</td>
<td class="noday">
</td>
<td class="fri">
1
</td>
<td class="sat">
2
</td>
</tr>
<tr>
<td class="sun">
3
</td>
<td class="mon">
4
</td>
<td class="tue">
5
</td>
<td class="wed">
6
</td>
<td class="thu">
7
</td>
<td class="fri">
8
</td>
<td class="sat">
9
</td>
</tr>
<tr>
<td class="sun">
10
</td>
<td class="mon">
11
</td>
<td class="tue">
12
</td>
<td class="wed">
13
</td>
<td class="thu">
14
</td>
<td class="fri">
15
</td>
<td class="sat">
16
</td>
</tr>
<tr>
<td class="sun">
17
</td>
<td class="mon">
18
</td>
<td class="tue">
19
</td>
<td class="wed">
20
</td>
<td class="thu">
21
</td>
<td class="fri">
22
</td>
<td class="sat">
23
</td>
</tr>
<tr>
<td class="sun">
24
</td>
<td class="mon">
25
</td>
<td class="tue">
26
</td>
<td class="wed">
27
</td>
<td class="thu">
28
</td>
<td class="fri">
29
</td>
<td class="sat">
30
</td>
</tr>
</table>
</body>
</html>
所以在这里,我尝试用链接替换文本字符串:
for elem in soup.find_all('td', class_=['sun', 'mon', 'tues', 'wed', 'thu', 'fri', 'sat']):
elem.string = '<a href="{}.html">'.format(elem.string) + elem.string + '</a>'
哪个创建:
<bound method Tag.prettify of <html><body><table border="0" cellpadding="0" cellspacing="0" class="month">
<tr><th class="month" colspan="7">November 2019</th></tr>
<tr><th class="sun">Sun</th><th class="mon">Mon</th><th class="tue">Tue</th><th class="wed">Wed</th><th class="thu">Thu</th><th class="fri">Fri</th><th class="sat">Sat</th></tr>
<tr><td class="noday"> </td><td class="noday"> </td><td class="noday"> </td><td class="noday"> </td><td class="noday"> </td><td class="fri"><a href="1.html">1</a></td><td class="sat"><a href="2.html">2</a></td></tr>
<tr><td class="sun"><a href="3.html">3</a></td><td class="mon"><a href="4.html">4</a></td><td class="tue">5</td><td class="wed"><a href="6.html">6</a></td><td class="thu"><a href="7.html">7</a></td><td class="fri"><a href="8.html">8</a></td><td class="sat"><a href="9.html">9</a></td></tr>
<tr><td class="sun"><a href="10.html">10</a></td><td class="mon"><a href="11.html">11</a></td><td class="tue">12</td><td class="wed"><a href="13.html">13</a></td><td class="thu"><a href="14.html">14</a></td><td class="fri"><a href="15.html">15</a></td><td class="sat"><a href="16.html">16</a></td></tr>
<tr><td class="sun"><a href="17.html">17</a></td><td class="mon"><a href="18.html">18</a></td><td class="tue">19</td><td class="wed"><a href="20.html">20</a></td><td class="thu"><a href="21.html">21</a></td><td class="fri"><a href="22.html">22</a></td><td class="sat"><a href="23.html">23</a></td></tr>
<tr><td class="sun"><a href="24.html">24</a></td><td class="mon"><a href="25.html">25</a></td><td class="tue">26</td><td class="wed"><a href="27.html">27</a></td><td class="thu"><a href="28.html">28</a></td><td class="fri"><a href="29.html">29</a></td><td class="sat"><a href="30.html">30</a></td></tr>
</table>
</body></html>>
如何让BeautifulSoup4
实际放入链接?
<tr>
<td class="sun">
<a href="3.index.html">3</a>
</td>
<<<etc>>>
答案 0 :(得分:0)
以我的评论为基础,您需要插入一个新标签,而不是修改<td>
的文本:
for elem in soup.find_all('td', class_=['sun', 'mon', 'tues', 'wed', 'thu', 'fri', 'sat']):
# Grab the current element text
# I got weird behavior where it would always use the last element's string if I didn't do this
text = elem.string
elem.string = ''
# Create new tag using "elem"'s text
new = soup.new_tag('a', href="{}.html".format(text))
new.string = text
# Insert <a> tag
elem.append(new)
这将产生:
<html>
<body>
<table border="0" cellpadding="0" cellspacing="0" class="month">
<tr>
<th class="month" colspan="7">
November 2019
</th>
</tr>
<tr>
<th class="sun">
Sun
</th>
<th class="mon">
Mon
</th>
<th class="tue">
Tue
</th>
<th class="wed">
Wed
</th>
<th class="thu">
Thu
</th>
<th class="fri">
Fri
</th>
<th class="sat">
Sat
</th>
</tr>
<tr>
<td class="noday">
</td>
<td class="noday">
</td>
<td class="noday">
</td>
<td class="noday">
</td>
<td class="noday">
</td>
<td class="fri">
<a href="1.html">
1
</a>
</td>
<td class="sat">
<a href="2.html">
2
</a>
</td>
</tr>
<tr>
<td class="sun">
<a href="3.html">
3
</a>
</td>
<td class="mon">
<a href="4.html">
4
</a>
</td>
<td class="tue">
5
</td>
<td class="wed">
<a href="6.html">
6
</a>
</td>
<td class="thu">
<a href="7.html">
7
</a>
</td>
<td class="fri">
<a href="8.html">
8
</a>
</td>
<td class="sat">
<a href="9.html">
9
</a>
</td>
</tr>
<tr>
<td class="sun">
<a href="10.html">
10
</a>
</td>
<td class="mon">
<a href="11.html">
11
</a>
</td>
<td class="tue">
12
</td>
<td class="wed">
<a href="13.html">
13
</a>
</td>
<td class="thu">
<a href="14.html">
14
</a>
</td>
<td class="fri">
<a href="15.html">
15
</a>
</td>
<td class="sat">
<a href="16.html">
16
</a>
</td>
</tr>
<tr>
<td class="sun">
<a href="17.html">
17
</a>
</td>
<td class="mon">
<a href="18.html">
18
</a>
</td>
<td class="tue">
19
</td>
<td class="wed">
<a href="20.html">
20
</a>
</td>
<td class="thu">
<a href="21.html">
21
</a>
</td>
<td class="fri">
<a href="22.html">
22
</a>
</td>
<td class="sat">
<a href="23.html">
23
</a>
</td>
</tr>
<tr>
<td class="sun">
<a href="24.html">
24
</a>
</td>
<td class="mon">
<a href="25.html">
25
</a>
</td>
<td class="tue">
26
</td>
<td class="wed">
<a href="27.html">
27
</a>
</td>
<td class="thu">
<a href="28.html">
28
</a>
</td>
<td class="fri">
<a href="29.html">
29
</a>
</td>
<td class="sat">
<a href="30.html">
30
</a>
</td>
</tr>
</table>
</body>
</html>
您的预期输出在href
URL中包含“索引”,但是根据您的问题和示例HTML,我不确定您期望它来自何处。如果需要,可以将其放入format
调用中。