Python:修改<a> elements

时间:2017-01-01 01:58:33

标签: python html beautifulsoup screen-scraping

I have a web page I'm scraping and parsing, using Beautiful Soup. On this webpage there are several refernces to other sources. They look a lot like this:`

Shakespeare wrote good, such as in <a href="link_to_source">Romeo and Juliet, IV:ii</a>.

What I'd like to have is:

Shakespeare wrote good, such as in (Romeo and Juliet, IV:ii).

Bare in mind, that this is a very long webpage with many lines and I need to combine all of them, so just modifying one "a" tag won't work for me, I need to modify all "a" tags on the page.

This is something I've tried already:

piska_ps = url_to_soup('https://he.wikisource.org'+a['href']).find_all('p')
    p_box = []
    for p in piska_ps:
        if p.a:
            for a_link in p.a:
                a_link.string = "("+a_link.string+")"

2 个答案:

答案 0 :(得分:0)

您可以使用replace_with替换标记:

piska_ps = url_to_soup('https://he.wikisource.org'+a['href']).find_all('p')
for p in piska_ps:
    for a in p.find_all('a'):
        a.replace_with("(" + a.string + ")")

答案 1 :(得分:0)

首先,p.a等于p.find('a'),它返回一个标记,你不能迭代它。

piska_ps = url_to_soup('https://he.wikisource.org'+a['href']).find_all('p')
p_box = []
    for p in piska_ps:
        if p.a:
            p.a.string = "("+p.a.string+")"