迭代结果集bs4

时间:2017-01-02 08:05:17

标签: python web-crawler

我使用bs4在bs4中提取这个结果集。

Moi not cute not hot, the ugly bui bui type 1

我正在尝试提取这两个元素。

Actually, moi also dun knowfrom bs4 import BeautifulSoup import urllib import re r = urllib.urlopen( 'http://forums.hardwarezone.com.sg/eat-drink-man-woman-16/%5Bofficial%5D-chit-chat-students-part-2-a-5526993-55.html').read() soup = BeautifulSoup(r, "lxml") letters = soup.find_all("div", attrs={"id":re.compile("post_message_\d+")})

letters.find_all('div')

这是我的代码。但是,我如何遍历结果集,以便它只在结束div之前提取内容方式。

{{1}}返回一个空集。

1 个答案:

答案 0 :(得分:0)

所有消息:

from bs4 import BeautifulSoup
import urllib
import re

r = urllib.urlopen(
    'http://forums.hardwarezone.com.sg/eat-drink-man-woman-16/%5Bofficial%5D-chit-chat-students-part-2-a-5526993-55.html').read()

soup = BeautifulSoup(r, "lxml")
letters = soup.find_all("div", attrs={"id":re.compile("post_message_\d+")})
for a in letters:
    print [b.strip() for b in a.text.strip().split('\n') if b.strip()]