如何使用美丽的汤去下一页?

时间:2016-10-20 17:55:49

标签: python beautifulsoup mechanize

我必须从网站的5个页面中提取信息。 每页末尾都有“下一页”按钮。这是下一个按钮的html代码 -

<li class="pagination__next" data-reactid=".0.3.0.0.1.1.1.3.2">
    <span class="icon-arrowright-thin--pagination" data-reactid=".0.3.0.0.1.1.1.3.2.0">
        ::before
    </span>
</li>

我正在使用beautifulsoup4来提取信息。如何导航到下一页。 我可以使用mechanize来导航这种

3 个答案:

答案 0 :(得分:3)

You can mimic the post to https://colleges.niche.com/entity-search/ but a much simpler way is to get the total number of pages from the first page then just loop in range 2 to number of pages. All that gets added to the start url is &page=page_number:

import requests
from bs4 import BeautifulSoup

start = "https://colleges.niche.com/?degree=4-year&sort=best"
url = "https://colleges.niche.com/?degree=4-year&sort=best&page={}"
soup = BeautifulSoup(requests.get(start).content)
pages = int(soup.select("select.pagination__pages__selector option")[-1].text.split(None, 1)[1])
print([a.text for a in soup.select("a.search__results__list__item__entity")])

for page in range(2, pages):
    soup = BeautifulSoup(requests.get(url.format(page)).content)
    print([a.text for a in soup.select("a.search__results__list__item__entity")])

If we run the code for a few iterations, you can see we get each page:

In [1]: import requests
   ...: from bs4 import BeautifulSoup
   ...: start = "https://colleges.niche.com/?degree=4-year&sort=best"
   ...: url = "https://colleges.niche.com/?degree=4-year&sort=best&page={}"
   ...: soup = BeautifulSoup(requests.get(start).content, "html.parser")
   ...: pages = int(soup.select("select.pagination__pages__selector option")[-1]
   ...: .text.split(None, 1)[1])
   ...: print([a.text for a in soup.select("a.search__results__list__item__entit
   ...: y")])
   ...: for page in range(2, pages):
   ...:     soup = BeautifulSoup(requests.get(url.format(page)).content, "html.p
   ...: arser")
   ...:     print([a.text for a in soup.select("a.search__results__list__item__e
   ...: ntity")])
   ...:     
[u'Stanford University', u'Massachusetts Institute of Technology', u'Yale University', u'Harvard University', u'Princeton University', u'Rice University', u'Bowdoin College', u'University of Pennsylvania', u'Washington University in St. Louis', u'Brown University', u'Duke University', u'Columbia University', u'Dartmouth College', u'Vanderbilt University', u'Pomona College', u'California Institute of Technology', u'University of Southern California', u'University of Notre Dame', u'University of Chicago', u'Washington & Lee University', u'Carleton College', u'Colgate University', u'University of Michigan - Ann Arbor', u'Northwestern University', u'Tufts University']
[u'Williams College', u'Georgetown University', u'Amherst College', u'Cornell University', u'Thomas Jefferson University', u'University of Texas - Health Science Center at Houston', u'Barnard College', u'Haverford College', u'Carnegie Mellon University', u'Emory University', u'University of California - Los Angeles', u'Harvey Mudd College', u'Medical University of South Carolina', u'Franklin W. Olin College of Engineering', u'Claremont McKenna College', u'Middlebury College', u'Swarthmore College', u'Bates College', u'University of Virginia', u'University of Texas - Austin', u'University of California - Berkeley', u'Virginia Tech', u'University of North Carolina at Chapel Hill', u'University of Texas - Medical Branch at Galveston', u'Davidson College']
[u'Colby College', u'Hamilton College', u'Samuel Merritt University', u'Georgia Institute of Technology', u'University of Richmond', u'Lehigh University', u'Grinnell College', u'Northeastern University', u'University of Illinois at Urbana-Champaign', u'New York University', u'University of Wisconsin', u'Wake Forest University', u'Reed College', u'Bucknell University', u'Oregon Health & Science University', u'Johns Hopkins University', u'Lafayette College', u'University of Texas - Health Science Center at San Antonio', u'Smith College', u'Wellesley College', u'University of Rochester', u'Scripps College', u'College of William & Mary', u'University of Florida', u'The Curtis Institute of Music']
[u'United States Coast Guard Academy', u'College of the Holy Cross', u'Penn State', u'Bryn Mawr College', u'Wesleyan University', u'Ohio State University', u'Colorado School of Mines', u'Texas A&M University', u'University of Maryland - Baltimore', u'Purdue University', u'University of California - Santa Barbara', u'University of Georgia', u'University of Miami', u'Tulane University', u'University of Tulsa', u'Boston College', u'The Juilliard School', u'Texas Tech University Health Sciences Center', u'Worcester Polytechnic Institute', u'Franklin & Marshall College', u'Brigham Young University', u'Southern Methodist University', u'Mount Holyoke College', u'Kenyon College', u'University of Washington']

If you were to mimic the post, the following would work. Depending on what data you want this actually may be preferable as you get json back:

import requests
from bs4 import BeautifulSoup

start = "https://colleges.niche.com/?degree=4-year&sort=best"
post = "https://colleges.niche.com/entity-search/"

data = {"degreeType": ["4-year"], "sort": "best", "page": 1, "vertical": "colleges"}

soup = BeautifulSoup(requests.get(start).content, "html.parser")
pages = int(soup.select("select.pagination__pages__selector option")[-1].text.split(None, 1)[1])
for page in range(1, pages+ 1):
    data["page"] = page
    r = requests.post(post, json=data)
    print(r.json())

That gives you data like:

{u'count': 2854, u'results': [{u'reviewCount': 258, u'netPrice': 20315, u'reviewAvg': 3.7713178294573644, u'totalStudents': 2034, u'grade': 4.33, u'tagline': u'4 Year &middot; Williamstown, MA', u'SATRange': u'1350-1560', u'label': u'Williams College', u'url': u'https://colleges.niche.com/williams-college/', u'ACTRange': u'31-34', u'location': {u'lat': 42.7117, u'lng': -73.2059}, u'guid': u'465D4A73-875C-498E-9C8F-E47568E156F2', u'type': u'College'}, {u'reviewCount': 1081, u'netPrice': 25786, u'reviewAvg': 3.698427382053654, u'totalStudents': 7226, u'grade': 4.33, u'tagline': u'4 Year &middot; Washington, DC', u'SATRange': u'1320-1520', u'label': u'Georgetown University', u'url': u'https://colleges.niche.com/georgetown-university/', u'ACTRange': u'30-33', u'location': {u'lat': 38.9088, u'lng': -77.0735}, u'guid': u'34AF6312-6F20-4D90-B512-AC5CD720AB25', u'type': u'College'}, {u'reviewCount': 247, u'netPrice': 14687, u'reviewAvg': 3.8259109311740893, u'totalStudents': 1792, u'grade': 4.33, u'tagline': u'4 Year &middot; Amherst, MA', u'SATRange': u'1350-1548', u'label': u'Amherst College', u'url': u'https://colleges.niche.com/amherst-college/', u'ACTRange': u'30-34', u'location': {u'lat': 42.3725, u'lng': -72.5185}, u'guid': u'127EC524-4BAC-4A5C-A7F5-1EAD9C309F44', u'type': u'College'}, {u'reviewCount': 1730, u'netPrice': 28537, u'reviewAvg': 3.654913294797688, u'totalStudents': 14269, u'grade': 4.33, u'tagline': u'4 Year &middot; Ithaca, NY', u'SATRange': u'1330-1510', u'label': u'Cornell University', u'url': u'https://colleges.niche.com/cornell-university/', u'ACTRange': u'30-34', u'location': {u'lat': 42.4453, u'lng': -76.4827}, u'guid': u'C35E497B-10BC-4482-92E5-F27941433B02', u'type': u'College'}, {u'reviewCount': 254, u'netPrice': None, u'reviewAvg': 3.8149606299212597, u'totalStudents': 649, u'grade': 4.33, u'tagline': u'4 Year &middot; Philadelphia, PA', u'SATRange': None, u'label': u'Thomas Jefferson University', u'url': u'https://colleges.niche.com/thomas-jefferson-university/', u'ACTRange': None, u'location': {u'lat': 39.9491, u'lng': -75.1581}, u'guid': u'E8C9EBC6-90C5-4CDF-A324-2CCE16060B61', u'type': u'College'}, {u'reviewCount': 131, u'netPrice': None, u'reviewAvg': 3.740458015267176, u'totalStudents': 539, u'grade': 4.33, u'tagline': u'4 Year &middot; Houston, TX', u'SATRange': None, u'label': u'University of Texas - Health Science Center at Houston', u'url': u'https://colleges.niche.com/university-of-texas----health-science-center-at-houston/', u'ACTRange': None, u'location': {u'lat': 29.7029, u'lng': -95.4032}, u'guid': u'43EEDD7D-8204-4014-961B-BEDDBD4C6417', u'type': u'College'}, {u'reviewCount': 390, u'netPrice': 21791, u'reviewAvg': 3.776923076923077, u'totalStudents': 2537, u'grade': 4.33, u'tagline': u'4 Year &middot; New York, NY', u'SATRange': u'1250-1440', u'label': u'Barnard College', u'url': u'https://colleges.niche.com/barnard-college/', u'ACTRange': u'28-32', u'location': {u'lat': 40.8091, u'lng': -73.964}, u'guid': u'DD4FCD82-8E4E-4F4C-A7DC-FADCEBB49681', u'type': u'College'}, {u'reviewCount': 190, u'netPrice': 22409, u'reviewAvg': 3.789473684210526, u'totalStudents': 1189, u'grade': 4.33, u'tagline': u'4 Year &middot; Haverford, PA', u'SATRange': u'1330-1490', u'label': u'Haverford College', u'url': u'https://colleges.niche.com/haverford-college/', u'ACTRange': u'31-34', u'location': {u'lat': 40.0134, u'lng': -75.3026}, u'guid': u'271075B3-07A0-450B-B4F3-78EB1FC7C03A', u'type': u'College'}, {u'reviewCount': 1310, u'netPrice': 33670, u'reviewAvg': 3.6068702290076335, u'totalStudents': 5699, u'grade': 4.33, u'tagline': u'4 Year &middot; Pittsburgh, PA', u'SATRange': u'1340-1540', u'label': u'Carnegie Mellon University', u'url': u'https://colleges.niche.com/carnegie-mellon-university/', u'ACTRange': u'30-34', u'location': {u'lat': 40.4446, u'lng': -79.9429}, u'guid': u'D8A17C0F-CC25-4D2A-B231-0303EA016427', u'type': u'College'}, {u'reviewCount': 1392, u'netPrice': 28203, u'reviewAvg': 3.757183908045977, u'totalStudents': 7732, u'grade': 4.33, u'tagline': u'4 Year &middot; Atlanta, GA', u'SATRange': u'1280-1460', u'label': u'Emory University', u'url': u'https://colleges.niche.com/emory-university/', u'ACTRange': u'29-32', u'location': {u'lat': 33.7988, u'lng': -84.3258}, u'guid': u'86AD5853-ED72-4EFD-855C-4746FF698941', u'type': u'College'}, {u'reviewCount': 4465, u'netPrice': 12510, u'reviewAvg': 3.838521836506159, u'totalStudents': 29033, u'grade': 4.33, u'tagline': u'4 Year &middot; Los Angeles, CA', u'SATRange': u'1190-1460', u'label': u'University of California - Los Angeles', u'url': u'https://colleges.niche.com/university-of-california----los-angeles/', u'ACTRange': u'27-33', u'location': {u'lat': 34.0689, u'lng': -118.444}, u'guid': u'1D1D82CF-C659-49F0-A526-7AFB85BD3A4F', u'type': u'College'}, {u'reviewCount': 122, u'netPrice': 33137, u'reviewAvg': 3.6639344262295084, u'totalStudents': 802, u'grade': 4.33, u'tagline': u'4 Year &middot; Claremont, CA', u'SATRange': u'1418-1570', u'label': u'Harvey Mudd College', u'url': u'https://colleges.niche.com/harvey-mudd-college/', u'ACTRange': u'33-35', u'location': {u'lat': 34.1061, u'lng': -117.711}, u'guid': u'20D662BE-8428-4DE2-BF0D-72D22F0A04B5', u'type': u'College'}, {u'reviewCount': 71, u'netPrice': None, u'reviewAvg': 4.014084507042253, u'totalStudents': 281, u'grade': 4.33, u'tagline': u'4 Year &middot; Charleston, SC', u'SATRange': None, u'label': u'Medical University of South Carolina', u'url': u'https://colleges.niche.com/medical-university-of-south-carolina/', u'ACTRange': None, u'location': {u'lat': 32.786, u'lng': -79.9469}, u'guid': u'7CD7C977-D16A-4399-8D7E-3B1FA0DFAB7D', u'type': u'College'}, {u'reviewCount': 115, u'netPrice': 29979, u'reviewAvg': 4.095652173913043, u'totalStudents': 350, u'grade': 4.33, u'tagline': u'4 Year &middot; Needham, MA', u'SATRange': u'1410-1550', u'label': u'Franklin W. Olin College of Engineering', u'url': u'https://colleges.niche.com/franklin-w-olin-college-of-engineering/', u'ACTRange': u'32-34', u'location': {u'lat': 42.2928, u'lng': -71.264}, u'guid': u'88A3438F-9304-481E-8022-0AE353991161', u'type': u'College'}, {u'reviewCount': 399, u'netPrice': 23982, u'reviewAvg': 3.87468671679198, u'totalStudents': 1298, u'grade': 4.33, u'tagline': u'4 Year &middot; Claremont, CA', u'SATRange': u'1350-1520', u'label': u'Claremont McKenna College', u'url': u'https://colleges.niche.com/claremont-mckenna-college/', u'ACTRange': u'30-33', u'location': {u'lat': 34.1023, u'lng': -117.707}, u'guid': u'DAE7241A-4D00-4C50-B1A5-F33BAF3A6C3B', u'type': u'College'}, {u'reviewCount': 458, u'netPrice': 20903, u'reviewAvg': 3.7139737991266375, u'totalStudents': 2492, u'grade': 4.33, u'tagline': u'4 Year &middot; Middlebury, VT', u'SATRange': u'1260-1470', u'label': u'Middlebury College', u'url': u'https://colleges.niche.com/middlebury-college/', u'ACTRange': u'30-33', u'location': {u'lat': 44.0091, u'lng': -73.1761}, u'guid': u'0E72BF23-A3CF-4995-9585-33B5BD0F9222', u'type': u'College'}, {u'reviewCount': 401, u'netPrice': 22557, u'reviewAvg': 3.56857855361596, u'totalStudents': 1534, u'grade': 4.33, u'tagline': u'4 Year &middot; Swarthmore, PA', u'SATRange': u'1360-1540', u'label': u'Swarthmore College', u'url': u'https://colleges.niche.com/swarthmore-college/', u'ACTRange': u'29-34', u'location': {u'lat': 39.9041, u'lng': -75.3561}, u'guid': u'891F20E2-4B6F-4626-83F3-15D502B2E7C1', u'type': u'College'}, {u'reviewCount': 320, u'netPrice': 22062, u'reviewAvg': 3.878125, u'totalStudents': 1773, u'grade': 4.33, u'tagline': u'4 Year &middot; Lewiston, ME', u'SATRange': None, u'label': u'Bates College', u'url': u'https://colleges.niche.com/bates-college/', u'ACTRange': None, u'location': {u'lat': 44.1053, u'lng': -70.2033}, u'guid': u'2C036559-5EBB-4C00-B3B8-6679A91FB040', u'type': u'College'}, {u'reviewCount': 1995, u'netPrice': 14069, u'reviewAvg': 3.800501253132832, u'totalStudents': 15622, u'grade': 4.33, u'tagline': u'4 Year &middot; Charlottesville, VA', u'SATRange': u'1250-1460', u'label': u'University of Virginia', u'url': u'https://colleges.niche.com/university-of-virginia/', u'ACTRange': u'28-33', u'location': {u'lat': 38.0365, u'lng': -78.5026}, u'guid': u'9EA86CB5-E8A6-47E6-A219-FDCABC31AE51', u'type': u'College'}, {u'reviewCount': 5513, u'netPrice': 16832, u'reviewAvg': 3.8824596408489027, u'totalStudents': 36309, u'grade': 4.33, u'tagline': u'4 Year &middot; Austin, TX', u'SATRange': u'1170-1410', u'label': u'University of Texas - Austin', u'url': u'https://colleges.niche.com/university-of-texas----austin/', u'ACTRange': u'26-32', u'location': {u'lat': 30.2847, u'lng': -97.7373}, u'guid': u'BC90E2B6-E112-43ED-AC5C-3548829EA3DD', u'type': u'College'}, {u'reviewCount': 3718, u'netPrice': 16655, u'reviewAvg': 3.5922538999462077, u'totalStudents': 26320, u'grade': 4.33, u'tagline': u'4 Year &middot; Berkeley, CA', u'SATRange': u'1240-1500', u'label': u'University of California - Berkeley', u'url': u'https://colleges.niche.com/university-of-california----berkeley/', u'ACTRange': u'29-34', u'location': {u'lat': 37.8715, u'lng': -122.26}, u'guid': u'09E8CD9A-F401-4C8B-A79C-F02E10AC0201', u'type': u'College'}, {u'reviewCount': 3382, u'netPrice': 18398, u'reviewAvg': 3.8793613246599645, u'totalStudents': 23685, u'grade': 4.33, u'tagline': u'4 Year &middot; Blacksburg, VA', u'SATRange': u'1110-1320', u'label': u'Virginia Tech', u'url': u'https://colleges.niche.com/virginia-tech/', u'ACTRange': None, u'location': {u'lat': 37.2286, u'lng': -80.4233}, u'guid': u'EEB0E829-996A-45B1-9671-3EF4AF096423', u'type': u'College'}, {u'reviewCount': 2138, u'netPrice': 10936, u'reviewAvg': 3.7787652011225443, u'totalStudents': 17570, u'grade': 4.33, u'tagline': u'4 Year &middot; Chapel Hill, NC', u'SATRange': u'1220-1420', u'label': u'University of North Carolina at Chapel Hill', u'url': u'https://colleges.niche.com/university-of-north-carolina-at-chapel-hill/', u'ACTRange': u'28-32', u'location': {u'lat': 35.9122, u'lng': -79.051}, u'guid': u'5712B0C1-3A40-4EA1-A324-9C4F76FEFD10', u'type': u'College'}, {u'reviewCount': 110, u'netPrice': None, u'reviewAvg': 3.8545454545454545, u'totalStudents': 586, u'grade': 4.33, u'tagline': u'4 Year &middot; Galveston, TX', u'SATRange': None, u'label': u'University of Texas - Medical Branch at Galveston', u'url': u'https://colleges.niche.com/university-of-texas----medical-branch-at-galveston/', u'ACTRange': None, u'location': {u'lat': 29.3113, u'lng': -94.7764}, u'guid': u'5FEEDB69-A566-4671-B821-28304A74F474', u'type': u'College'}, {u'reviewCount': 264, u'netPrice': 22457, u'reviewAvg': 3.8333333333333335, u'totalStudents': 1770, u'grade': 4.33, u'tagline': u'4 Year &middot; Davidson, NC', u'SATRange': u'1230-1440', u'label': u'Davidson College', u'url': u'https://colleges.niche.com/davidson-college/', u'ACTRange': u'28-32', u'location': {u'lat': 35.5, u'lng': -80.8452}, u'guid': u'1AD50A05-6325-4392-B428-A08C944E61EF', u'type': u'College'}], u'page': 1, u'pageSize': 25, u'pageCount': 40}

Which probably includes dynamically created content that you would not get in the source returned.

For the reviews url https://colleges.niche.com/williams-college/reviews, you need to parse a token from the source then do a post exactly like before:

import requests
import re

patt = re.compile('"entityGuid":"(.*?)"')
url = "https://colleges.niche.com/williams-college/reviews/"
soup = BeautifulSoup(requests.get(url).content)
data_tag = patt.search(soup.select_one("#dataLayerTag").text).group(1)
params = {"e": data_tag, "page": 2, "limit": "20"}
url = "https://niche.com/api/entity-reviews/"
resp = requests.get(url, params=params)
print(resp.json())

Which gives you:

{u'reviews': [{u'body': u'I enjoy being in classes here, but the work gets overwhelming. People are great but very cliquy.', u'rating': 4, u'guid': u'35b6faeb-95b2-4385-b3ee-19e6c7984e1b', u'created': u'2016-04-20T22:24:56Z', u'author': u'College Sophomore'}, {u'body': u'The alumni network is great. Easy to use. But the career center sucks.', u'rating': 4, u'guid': u'beddcae1-d860-4a8a-a431-45bf7e7087e6', u'created': u'2016-04-20T22:24:56Z', u'author': u'College Sophomore'}, {u'body': u"It's hard for sophomores to get good housing. Even as a senior, the good housings are far away from campus. But almost everyone has singles, even freshman.", u'rating': 3, u'guid': u'fff99560-0b4f-499d-a95b-7b3b3f9826f0', u'created': u'2016-04-20T22:19:27Z', u'author': u'College Sophomore'}, {u'body': u"We don't have greek life.", u'rating': 1, u'guid': u'69e60cf0-ff3c-4b34-acf1-6315d878c205', u'created': u'2016-04-20T22:17:35Z', u'author': u'College Sophomore'}, {u'body': u"There's not a lot of team spirit here. Athletes are nice, but they tend to hang among themselves.", u'rating': 3, u'guid': u'b31ee366-1b68-4c0f-b262-ff628243887c', u'created': u'2016-04-20T22:17:02Z', u'author': u'College Sophomore'}, {u'body': u'Williams offer a lot of chances to study abroad, but the social scene is very very limited.', u'rating': 4, u'guid': u'11a3feb2-21fa-45d9-8ee0-e6e1e8cea0c0', u'created': u'2016-04-20T22:15:35Z', u'author': u'College Sophomore'}, {u'body': u"Most people will live on campus all four years. It's not a bad deal!", u'rating': 4, u'guid': u'4a845124-7cfd-4059-8d63-cb1d414ce0cc', u'created': u'2016-04-08T13:58:30Z', u'author': u'College Senior'}, {u'body': u'The facilities have everything you could need as a varsity or non-varsity athlete. With our new football/lacrosse field and track, we have it made! Still, with an active there is always competition for prime field time, and IM sports are relegated either to early/late hours or ungroomed fields.', u'rating': 4, u'guid': u'31c89c4d-91ee-4b92-a198-3e12c304d7e1', u'created': u'2016-04-08T13:55:12Z', u'author': u'College Senior'}, {u'body': u'I have loved my time at Williams! The best part of my experience has been the people here, and as a senior trying to figure out post graduate plans, I am comforted by the willingness to help and commitment to the College from alumni. Go Ephs!', u'rating': 4, u'guid': u'4458ed87-4183-4784-908a-6ae67582e82c', u'created': u'2016-04-08T13:51:51Z', u'author': u'College Senior'}, {u'body': u'Could be better but overall good.', u'rating': 4, u'guid': u'08327955-2698-4fe6-ac1f-13108327cc21', u'created': u'2016-01-01T22:51:16Z', u'author': u'College Junior'}, {u'body': u'Better this year than past years.', u'rating': 3, u'guid': u'1892de02-eb45-42b5-b728-34912499e5eb', u'created': u'2016-01-01T22:43:54Z', u'author': u'College Junior'}, {u'body': u'Could have better facilities. Otherwise, great.', u'rating': 4, u'guid': u'2dc48cb2-d21f-4fd6-a9c7-19a5e513e6d6', u'created': u'2016-01-01T22:40:45Z', u'author': u'College Junior'}, {u'body': u'Awesome experience. Very community-oriented school. I love this place. Great people. Everyone wants to help you, the professors are amazing.', u'rating': 5, u'guid': u'5fa28a31-9391-4db7-b70d-5e2aa58708b3', u'created': u'2016-01-01T22:39:06Z', u'author': u'College Junior'}, {u'body': u"Williams has been the perfect place for me. My professors have been incredible mentors--I've gone to three professors' houses for dinner. The location is beautiful, and perfect for focusing on academics. I've been able to get very involved in all my clubs and really find what makes me passionate. But best of all is the people. They're all smart and talented and wonderful. I am so lucky.", u'rating': 5, u'guid': u'81ff499b-4721-4625-bee1-acf1e9b21916', u'created': u'2015-08-25T13:08:28Z', u'author': u'College Junior'}, {u'body': u"I don't know much, only seniors can live off campus.", u'rating': 3, u'guid': u'd9dc2e2f-a08d-4a01-8fe2-410623f93d7a', u'created': u'2015-04-27T19:31:06Z', u'author': u'College Freshman'}, {u'body': u"Everything closes really early, but there's some good food. No chains really.", u'rating': 3, u'guid': u'5993a99e-a936-40c8-ae0d-4581c8d089ef', u'created': u'2015-04-27T19:30:01Z', u'author': u'College Freshman'}, {u'body': u"It's kind of sad. There's never more than a handful of things happening on fridays or satudays and there's nothing for the rest of the week", u'rating': 3, u'guid': u'65c83983-2f6f-4b08-b870-06c35fd2b0e9', u'created': u'2015-04-27T19:27:34Z', u'author': u'College Freshman'}, {u'body': u"Having visitors is pretty easy. One of the officers is the worst but otherwise they're generally lenient about weed and alcohol.", u'rating': 4, u'guid': u'bcd95788-22b7-4a23-b942-2493206d1734', u'created': u'2015-04-27T19:21:34Z', u'author': u'College Freshman'}, {u'body': u"They usually give you a good package, but a lot of it is work-study and students don't have the free time for that here.", u'rating': 3, u'guid': u'1a87483c-952c-479b-9a57-65fb09895e75', u'created': u'2015-04-27T19:19:35Z', u'author': u'College Freshman'}, {u'body': u"Food is kind of repetitive. Pretty much all the kitchens are very wasteful. We can't use meal plans anywhere off campus.", u'rating': 3, u'guid': u'361b725f-bedc-4452-843d-5dc284c18dcd', u'created': u'2015-04-27T19:17:22Z', u'author': u'College Freshman'}], u'total': 246, u'limit': 20, u'page': 2}

You should be able to figure that rest out yourself based on the other parts to the answer.

答案 1 :(得分:2)

如果&#34;下一页&#34;涉及javascript,然后是的,你只能机械化。你可以用selenium

来做
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait

client = webbrowser.get('firefox')
browser = webdriver.Chrome('./chromedriver')

url = "www.example.com"
browser.get(url)
###### Wait until you see some element that signals the page is completely loaded
WebDriverWait(browser, timeout=10).until(lambda x: x.find_element_by_class_name('Even'))

############## do your things with the first page
content =  browser.page_source.encode('ascii','ignore').decode("utf-8")


#### Now if you are sure there is next page
next_button_class = 'icon-arrowright-thin--pagination' ###here insert the class of 'next button'
browser.find_element_by_class_name(next_button_class).click()
time.sleep(3)

###### Wait until you see some element that signals the page is completely loaded
WebDriverWait(browser, timeout=10).until(lambda x: x.find_element_by_class_name('Even'))

content =  browser.page_source.encode('ascii','ignore').decode("utf-8")

答案 2 :(得分:1)

BeautifulSoup是一个HTML解析器,不是Web浏览器,它不能导航或下载页面。为此,您通常使用HTTP库(如urllibrequest)从特定网址获取HTML,以便将其提供给BeautifulSoup。在您的情况下,可以使用mechanize来执行此操作。

不幸的是,您的分页按钮提供的HTML不是链接,因此它没有href属性。如果是这样,您就可以轻松地从中解析URL并告诉您的HTTP库去取它。

相反,您需要使用mechanize来模拟该按钮上的click事件,等待很短的时间,然后假设新页面已加载,然后将生成的HTML传递给BeautifulSoup。