I wanted to try some basic web-scraping but ran into a problem since I am used to simple td-tags, in this case I had a webpage which had the following pre-tag and all the text inside of it which means it is a bit trickier to scrape it.
<pre style="word-wrap: break-word; white-space: pre-wrap;">
11111111
11111112
11111113
11111114
11111115
</pre>
Any suggestions on how to scrape each row?
Thanks
答案 0 :(得分:4)
If that is exactly what you want to parse, you can use the splitlines()
function easily to get a list of rows, or you can tweak the split()
function like this.
from bs4 import BeautifulSoup
content = """
<pre style="word-wrap: break-word; white-space: pre-wrap;">
11111111
11111112
11111113
11111114
11111115
</pre>""" # This is your content
soup = BeautifulSoup(content, "html.parser")
stuff = soup.find('pre').text
lines = stuff.split("\n") # or replace this by stuff.splitlines()
# print(lines) gives ["11111111", "11111112", "11111113", "11111114", "11111115"]
for line in lines:
print(line)
# prints each row separately.
答案 1 :(得分:0)
If each line is indeed on a line by itself, why not just split the content into a list?
data = soup.find('pre').text
lines = data.splitlines()
You can pass True
into the splitlines routine to keep the line endings if that's what you desire.