试图从教师大学网站解析名称(和博士学位)。难以获得这一点

时间:2015-10-18 21:21:16

标签: python parsing web

from bs4 import BeautifulSoup #imports beautifulSoup package
import urllib2

url = 'https://www.marshall.usc.edu/faculty/phd' #sets url to a variable
page = urllib2.urlopen(url)
soup = BeautifulSoup(page.read(), "lxml") #sets the contents of the page to the variable soup

#names = soup.find_all('tr', {'class': 'odd views-row-first'})

names = soup.find_all('td', {'class': 'views-field views-field-field-faculty-name-last-value active'}) #sets the name 'cell' and tags
#namesU = names.replaceAll("<[^>]*>","")

#names.strip('<td class="views-field views-field-field-faculty-name-last-value active">') 
#names2 = names.sub('<td class="views-field views-field-field-faculty-name-last-value active">', '')

print(names)

1 个答案:

答案 0 :(得分:0)

您可以使用&#34; text&#34;来解决此问题。 find_all之后的属性为&#39; td&#39;。

所以,你从find_all得到的结果,你只需迭代并获得&#34;文本&#34;你拥有的每个部分的一部分,并将其放在你的名字数组中。

以下是实现此目的的列表理解方法:

names = [i.text.strip() for i in soup.find_all('td', {'class': 'views-field views-field-field-faculty-name-last-value active'})]

运行后,输出结果为:

['Amato, Andrea', 'Banerjee, Trambak', 'Basu, Pallavi', 'Chang, Wayne', 'Chung, Sung Hun', 'Comings, Alison', 'Cui, Hailong', 'DeGroot, Tyler', 'Dutton, Chaumanix', 'Fu, Luella', 'Golrezaei, Negin', 'Grandy, Jake', 'Han, Rong Qing', 'Han, Ju Rie (Alyssa)', 'Harmon, Derek', 'Hong, Jihoon', 'Jia, He', 'Joshi, Priyanka', 'Kays, Allison', 'Kfir, Alon', 'Kim, Jeunghyun', 'Kim, Pureum', 'Kim, Yookyoung', 'Kim , Jennifer', 'Krikorian, Mariam', 'Lang, Tina', 'Lee, Jennifer', 'Lee, Suk won', 'Lee, Yoonju', 'Li, Guang', 'Li, Yuan', 'Ling, Yun', 'Magkotsios, Georgios', 'Min, Bora', 'Newman, David', 'Oh, Seung Hwan', 'Ozkan, Erhun', 'Paulson, Courtney', 'Pei, Lei', 'Pyun, Sung June', 'Raj, Medha', 'Raveendhran, Roshni', 'Rich, Beverly', 'Ritter, Stacey', 'Sahoo, Satish', 'Skripnik, Roman', 'Smallets, Stephanie', 'Song, Shiwon', 'Stamenov, Ventsislav', 'Subler, Megan', 'Talijan, Vuk', 'Uhalde, Arianna', 'Valsesia, Francesca', 'Wan, Yuan', 'Wang, Jue', 'Wang, Weinan', 'Wang, Xuan', 'Wang, Yongzhi (Alex)', 'Wang, Yingfei (Fiona)', 'Wong, Vivian', 'Xia, Jingjing', 'Xing, Zhe (Adele)', 'Xu, Zibin', 'Yang, Louis', 'Yao, Yao', 'Yi, Irene', 'Yordanov, Kristian', 'Yu, Xiaoqian', 'Zhang, Heng', 'Zhang, Yanwei (Wayne)', 'Zhang, Yingguang', 'Zhang, Mengxia']