Python-无法检索1页以上的完整文本数据

时间:2015-06-26 00:52:03

标签: python html python-2.7 web-scraping beautifulsoup

我是Python编程的新手,我面临以下问题: 目标:我需要废弃自由职业者网站并存储用户列表及其属性(分数,评分,评论,详细信息,费率等) 到一个文件。我有以下代码,但我无法获得所有用户。

另外,有时我运行程序,输出会改变。

import requests
from bs4 import BeautifulSoup

pages = 1
fileWriter =open('freelancers.txt','w')
url = 'https://www.freelancer.com/freelancers/skills/all/'+str(pages)+'/'
r = requests.get(url)

#gets the html contents and stores them into soup object

soup = BeautifulSoup(r.content)
links = soup.findAll("a")

#Finds the freelancer-details nodes and stores the html content into c_data

c_data = soup.findAll("div", {"class":"freelancer-details"})
for item in c_data:
    print item.text 
    fileWriter.write('Freelancers Details:'+item.text+'\t')
#Writes the result into text file

我需要获取特定用户下的用户详细信息。但到目前为止,产量看起来已经分散。

示例输出:    自由职业者详情:

thetechie13
507 Reviews




$20 USD/hr

Top Skills:

       Website Design, 


       HTML, 


       PHP, 


       eCommerce, 


       Volusion

Dear Customer - We are a team of 75 Most Creative People and proud to be
Preferred Freelancer on  Freelancer.com. We offer wide range of web
solutions and IT services that are bespoke in nature, can best fit our
clients' business needs and provide them cost benefits.

1 个答案:

答案 0 :(得分:0)

如果您想要每个单独的文本组件(每个都分配了不同的名称),我建议您分别从HTML中解析文本。但是,如果您希望将它们组合在一起,您可以加入字符串:

print ' '.join(item.text.split())

这将在每个单词之间放置一个空格。