我如何将img元素和文本放入span-block?

时间:2019-06-03 08:16:08

标签: python html python-3.x beautifulsoup

我有这样的跨度块:

<span class="selectable-text invisible-space copyable-text" dir="ltr">
     some text
     <img alt="" class="b61 emoji wa selectable-text invisible-space copyable-text" data-plain-text="" src="URL" style="background-position: -20px -20px;"/>
     more some text
     <img alt="" class="b62 emoji wa selectable-text invisible-space copyable-text" data-plain-text="" src="URL" style="background-position: -40px -40px;"/>
     blah-blah-blah
     <img alt="" class="b76 emoji wa selectable-text invisible-space copyable-text" data-plain-text="" src="URL" style="background-position: 0px -20px;"/>
</span>
soup.find('span', {'class': 'selectable-text invisible-space copyable-text'}).get_text()

这段代码只给我文本。

我想到的一切

span = soup.select('span', {'class': 'selectable-text invisible-space copyable-text'})
for item in span:
    if re.match('.*emoji', str(item)):
        ...

现在我有这样的字符串:

<span class="selectable-text invisible-space copyable-text" dir="ltr">some text <img alt="?" class="b61 emoji wa selectable-text invisible-space copyable-text" data-plain-text="?" src="URL" style="background-position: -20px -20px;"/>more some text<img alt="?" class="b62 emoji wa selectable-text invisible-space copyable-text" data-plain-text="?" src="URL" style="background-position: -40px -40px;"/> blah-blah-blah  <img alt="?" class="b76 emoji wa selectable-text invisible-space copyable-text" data-plain-text="?" src="URL" style="background-position: 0px -20px;"/></span>

在我看来,下一步是使用正则表达式获取我需要的元素。

还有其他方法来获取类似这样的字符串吗?

some text <emoji> more some text <emoji> blah-blah-blah <emoji>

3 个答案:

答案 0 :(得分:0)

如果要将文本和img提取到一个跨度中,则下面的代码应该可以使用。

from bs4 import BeautifulSoup as bs

stra = """
<span class="selectable-text invisible-space copyable-text" dir="ltr">
     some text
     <img alt="" class="b61 emoji wa selectable-text invisible-space copyable-text" data-plain-text="" src="URL" style="background-position: -20px -20px;"/>
     more some text
     <img alt="" class="b62 emoji wa selectable-text invisible-space copyable-text" data-plain-text="" src="URL" style="background-position: -40px -40px;"/>
     blah-blah-blah
     <img alt="" class="b76 emoji wa selectable-text invisible-space copyable-text" data-plain-text="" src="URL" style="background-position: 0px -20px;"/>
</span>
"""
soup = bs(stra, 'html.parser')

ch = list(soup.find('span', {'class': 'selectable-text invisible-space copyable-text'}).children)

for i in zip(ch[::2], ch[1::2]):
    print('<span>{}{}</span>'.format(*i))

输出:

<span>
     some text
     <img alt="" class="b61 emoji wa selectable-text invisible-space copyable-text" data-plain-text="" src="URL" style="background-position: -20px -20px;"/>
</span>
<span>
     more some text
     <img alt="" class="b62 emoji wa selectable-text invisible-space copyable-text" data-plain-text="" src="URL" style="background-position: -40px -40px;"/>
</span>
<span>
     blah-blah-blah
     <img alt="" class="b76 emoji wa selectable-text invisible-space copyable-text" data-plain-text="" src="URL" style="background-position: 0px -20px;"/>
</span>

答案 1 :(得分:0)

好像您需要.replaceWith

例如:

from bs4 import BeautifulSoup

html = """<span class="selectable-text invisible-space copyable-text" dir="ltr">
     some text
     <img alt="" class="b61 emoji wa selectable-text invisible-space copyable-text" data-plain-text="" src="URL" style="background-position: -20px -20px;"/>
     more some text
     <img alt="" class="b62 emoji wa selectable-text invisible-space copyable-text" data-plain-text="" src="URL" style="background-position: -40px -40px;"/>
     blah-blah-blah
     <img alt="" class="b76 emoji wa selectable-text invisible-space copyable-text" data-plain-text="" src="URL" style="background-position: 0px -20px;"/>
</span>"""

soup = BeautifulSoup(html, "html.parser")
for span in soup.findAll('span', {'class': 'selectable-text invisible-space copyable-text'}):
    for img in span.findAll("img"):
        img.replaceWith(r"<emoji>")
print(soup.prettify(formatter=None))

输出:

<span class="selectable-text invisible-space copyable-text" dir="ltr">
 some text
 <emoji>
 more some text
 <emoji>
 blah-blah-blah
 <emoji>
</span>

答案 2 :(得分:0)

Span标记内查找子级,然后使用previous_element(它是文本值)。

from bs4 import BeautifulSoup
data='''<span class="selectable-text invisible-space copyable-text" dir="ltr">
     some text
     <img alt="" class="b61 emoji wa selectable-text invisible-space copyable-text" data-plain-text="" src="URL" style="background-position: -20px -20px;"/>
     more some text
     <img alt="" class="b62 emoji wa selectable-text invisible-space copyable-text" data-plain-text="" src="URL" style="background-position: -40px -40px;"/>
     blah-blah-blah
     <img alt="" class="b76 emoji wa selectable-text invisible-space copyable-text" data-plain-text="" src="URL" style="background-position: 0px -20px;"/>
</span>'''

soup=BeautifulSoup(data,'html.parser')
itemtag=soup.find('span', class_='selectable-text invisible-space copyable-text')
children = itemtag.findChildren()
items=[]
for child in children:
  items.append(child.previous_element.replace('\n','').strip())
  items.append(child)

print(items)

输出:

['some text', <img alt="" class="b61 emoji wa selectable-text invisible-space copyable-text" data-plain-text="" src="URL" style="background-position: -20px -20px;"/>, 'more some text', <img alt="" class="b62 emoji wa selectable-text invisible-space copyable-text" data-plain-text="" src="URL" style="background-position: -40px -40px;"/>, 'blah-blah-blah', <img alt="" class="b76 emoji wa selectable-text invisible-space copyable-text" data-plain-text="" src="URL" style="background-position: 0px -20px;"/>]