我有这段代码:
import requests
from bs4 import BeautifulSoup
def posts_spider():
url = 'http://www.reddit.com/r/nosleep/new/'
source_code = requests.get(url)
plain_text = source_code.text
soup = BeautifulSoup(plain_text)
for link in soup.findAll('a', {'class': 'title'}):
href = "http://www.reddit.com" + link.get('href')
title = link.string
print(title)
print(href)
print("\n")
def get_single_item_data():
item_url = 'http://www.reddit.com/r/nosleep/new/'
source_code = requests.get(item_url)
plain_text = source_code.text
soup = BeautifulSoup(plain_text)
for rating in soup.findAll('div', {'class': 'score unvoted'}):
print(rating.string)
posts_spider()
get_single_item_data()
输出结果为:
My light.. I'm seeing and feeling things.. what's happening?
http://www.reddit.com/r/nosleep/comments/2kw0nu/my_light_im_seeing_and_feeling_things_whats/
Why being the first to move in a new Subdivision is not the most brilliant idea...
http://www.reddit.com/r/nosleep/comments/2kw010/why_being_the_first_to_move_in_a_new_subdivision/
I Am Falling.
http://www.reddit.com/r/nosleep/comments/2kvxvt/i_am_falling/
Heidi
http://www.reddit.com/r/nosleep/comments/2kvrnf/heidi/
I remember everything
http://www.reddit.com/r/nosleep/comments/2kvrjs/i_remember_everything/
To Lieutenant Griffin Stone
http://www.reddit.com/r/nosleep/comments/2kvm9p/to_lieutenant_griffin_stone/
The woman in my room
http://www.reddit.com/r/nosleep/comments/2kvir0/the_woman_in_my_room/
Dr. Margin's Guide to New Monsters: The Guest, or, An Update
http://www.reddit.com/r/nosleep/comments/2kvhe5/dr_margins_guide_to_new_monsters_the_guest_or_an/
The Evil Woman (part 5)
http://www.reddit.com/r/nosleep/comments/2kva73/the_evil_woman_part_5/
Blood for the blood god, The first of many.
http://www.reddit.com/r/nosleep/comments/2kv9gx/blood_for_the_blood_god_the_first_of_many/
An introduction to the beginning of my journey
http://www.reddit.com/r/nosleep/comments/2kv8s0/an_introduction_to_the_beginning_of_my_journey/
A hunter..of sorts.
http://www.reddit.com/r/nosleep/comments/2kv8oz/a_hunterof_sorts/
Void Trigger
http://www.reddit.com/r/nosleep/comments/2kv84s/void_trigger/
What really happened to Amelia Earhart
http://www.reddit.com/r/nosleep/comments/2kv80r/what_really_happened_to_amelia_earhart/
I Used To Be Fine Being Alone
http://www.reddit.com/r/nosleep/comments/2kv2ks/i_used_to_be_fine_being_alone/
The Green One
http://www.reddit.com/r/nosleep/comments/2kuzre/the_green_one/
Elevator
http://www.reddit.com/r/nosleep/comments/2kuwxu/elevator/
Scary story told by my 4 year old niece- The Guy With Really Big Scary Claws
http://www.reddit.com/r/nosleep/comments/2kuwjz/scary_story_told_by_my_4_year_old_niece_the_guy/
Cranial Nerve Zero
http://www.reddit.com/r/nosleep/comments/2kuw7c/cranial_nerve_zero/
Mom's Story About a Ghost Uncle
http://www.reddit.com/r/nosleep/comments/2kuvhs/moms_story_about_a_ghost_uncle/
It snowed.
http://www.reddit.com/r/nosleep/comments/2kutp6/it_snowed/
The pocket watch I found at a store
http://www.reddit.com/r/nosleep/comments/2kusru/the_pocket_watch_i_found_at_a_store/
You’re Going To Die When You Are 23
http://www.reddit.com/r/nosleep/comments/2kur3m/youre_going_to_die_when_you_are_23/
The Customer: Part Two
http://www.reddit.com/r/nosleep/comments/2kumac/the_customer_part_two/
Dimenhydrinate
http://www.reddit.com/r/nosleep/comments/2kul8e/dimenhydrinate/
•
•
•
•
•
12
12
76
4
2
4
6
4
18
2
6
13
5
16
2
2
14
48
1
13
我想要做的是,将每个帖子的匹配评级放在它旁边,这样我就可以立即告诉该帖子有多少评分,而不是在1个“块”中打印标题和链接在另一个“块”中评级数字。 在此先感谢您的帮助!
答案 0 :(得分:1)
您可以通过使用div
迭代class="thing"
元素来一次性完成(将其视为迭代帖子)。对于每个div
,请获取链接和评分:
from urlparse import urljoin
from bs4 import BeautifulSoup
import requests
def posts_spider():
url = 'http://www.reddit.com/r/nosleep/new/'
soup = BeautifulSoup(requests.get(url).content)
for thing in soup.select('div.thing'):
link = thing.find('a', {'class': 'title'})
rating = thing.find('div', {'class': 'score'})
href = urljoin("http://www.reddit.com", link.get('href'))
print(link.string, href, rating.string)
posts_spider()
仅供参考,div.thing
是一个CSS Selector
,可将所有div
与class="thing"
匹配。