我正试图从维基百科中获得一个非常基本的,简短的基本无序列表<ul>
。我的最终目标是将其放入DataFrame
。
我的问题是,我从哪里开始?
In [28]: from bs4 import BeautifulSoup
import urllib2
import requests
from pandas import Series,DataFrame
In [29]: url = "https://en.wikipedia.org/wiki/National_Pro_Grid_League"
In [31]: result = requests.get(url)
In [32]: c = result.content
In [33]: soup = BeautifulSoup(c)
我似乎无法在StackOverflow上找到任何答案,所以我很感激任何人都可以给我的建议。
这是我正在寻找的具体清单:
Active teams[edit]
Baltimore Anthem (2015–present)
Boston Iron (2014–present)
DC Brawlers (2014–present)
Los Angeles Reign (2014–present)
Miami Surge (2014–present)
New York Rhinos (2014–present)
Phoenix Rise (2014–present)
San Francisco Fire (2014–present)
答案 0 :(得分:3)
首先,您需要找到页面的正确部分。您可以通过找到id="Active_teams"
的标题,然后从那里找到下一个<ul>
元素来执行此操作。
from bs4 import BeautifulSoup
import requests
url = "https://en.wikipedia.org/wiki/National_Pro_Grid_League"
r = requests.get(url)
soup = BeautifulSoup(r.content)
heading = soup.find(id='Active_teams')
teams = heading.find_next('ul')
for team in team:
print team.string