应用错误收集

I'm pretty new to webscraping, so I wrote a small little script to extract player scores from this site: http://fold.it/portal/players Here's the code: import urllib2 from bs4 import BeautifulSoup soup = BeautifulSoup(urllib2.urlopen("http://www.fold.it/portal/players").read() for row in soup('tr', {'class':'even'}): rank = row('td')[0].string td2 = row('td')[1] for name in td2('a'): user = name.text score = row('td')[2].string print rank, user, score Now, this works pretty well except the user also has the two other scores in their name as well. Looking at the html, it seems there are two span elements after the a href. My first thought was to split 'user' on white space, but some names have spaces in them, so that didn't work. I also thought about looking for numeric, but some users have numeric names as well. I figure eliminating the span is my best option. However, I'm not sure what the best way to parse them out would be. Any help would be appreciated!

Eliminating Span Elements in a nested TD using BeautifulSoup

1 个答案: