如何使用beautful soup通过span标签刮取文本? scrape faculty members informations
from bs4 import BeautifulSoup
import requests
r = requests.get("http://www.uoj.ac.ae/ContentBan.aspx?m=15&p=4&sm=4")
soup = BeautifulSoup(r.content, 'html5lib')
for tag in soup.find_all('table'):
if tag.has_attr("class"):
if tag['class'] == 'MsoTableGrid':
for tag1 in soup.findAll('span'):
print tag1.text
我想在span标签内打印文本,但我得到的输出是:
Process finished with exit code 0
答案 0 :(得分:1)
您可以使用CSS选择器找到tr
table
类MsoTableGrid
的 >>> rows = soup.select("table.MsoTableGrid tr")
>>> for r in rows:
... faculty_info = r.find_all("td")[1:3]
... if len(faculty_info) == 2:
... print faculty_info[0].text.strip(), faculty_info[1].text.strip()
...
Name E-mail
Dr. Hassan Ali Dabouq dr.hassandbouk@uoj.ac.ae
Prof.dr.Magdie Medhat Elnahry magdielnahry@uoj.ac.ae
Dr. Abd Elwahaab Mohamed Khalil abdelwahab@uoj.ac.ae
Dr. Ahmed Hassan Fouly Dr.ahmedfoly@uoj.ac.ae
Dr. Walid Mohamed Abbas walidabas@uoj.ac.ae
Dr. Wael Mahmoud Fakhry wfakhry@uoj.ac.ae
Dr. Kamel Abd Elaziz Ali kamelali@uoj.ac.ae
.
.
.
个元素,然后获取所需信息,例如教师姓名和电子邮件地址,来自行的列,例如:
var app = angular.module('app', []);
app.controller('MainCtrl', function($scope) {
$scope.severity_list = [{
rank: 1,
generic_value: 'severe'
}, {
rank: 2,
generic_value: 'not so bad'
}];
$scope.initialiseOptions = function() {
for (i = 0; i < $scope.severity_list.length; i++) {
$scope.severity_list[i].Text = $scope.severity_list[i].rank + '-' + $scope.severity_list[i].generic_value;
}
}
$scope.initialiseOptions();
$scope.dropdownChanged = function() {
if($scope.s_value){
$scope.initialiseOptions(); // reset our previous selections
$scope.s_value.Text = $scope.s_value.rank;// Set our display to only rank after its chosen
}
};
});
答案 1 :(得分:0)
如果您想从所有范围文本中提取而不考虑类名,请尝试以下方法: -
from bs4 import BeautifulSoup
import requests
r = requests.get("http://www.uoj.ac.ae/ContentBan.aspx?m=15&p=4&sm=4")
soup = BeautifulSoup(r.content, 'lxml')
span_text = soup.findAll('span')
for s in span_text:
print(s.text)