for循环不接受if-else语句

时间:2019-03-08 23:04:56

标签: python for-loop if-statement web-scraping

我正在努力用此功能打印出10行/输出。到目前为止,我只有两个输出。 for循环未逐行读取标签,或者if-else语句存在错误。

# copy and paste the url from indeed using your search term
URL = 'https://www.indeed.com/jobs?q=data+scientist+%2485%2C000&l=New+York'

#conducting a request of the stated URL above:
page = requests.get(URL)

#specifying a desired format of “page” using the html parser - this allows python to read the various components of the page, rather than treating it as one long string.

soup = BeautifulSoup(page.text, 'html.parser')
#printing soup in a more structured tree format that makes for easier reading
print(soup.prettify())


def extract_salary_from_result(soup): 
  salaries = []
  for td in soup.find_all(name='td', attrs={'class':'snip'}):
    for div in td.find_all(name='div', attrs={'class':'salarySnippet'}):
       salary = div.find_all(name='span', attrs={'class':'salary no-wrap'})
       #print('salary in 2nd for-loop', salary)
       #if len(salary) > 0:
       for c in salary:
          salaries.append(c.text.strip())
          print('salary in if statement',salaries)
       else:
          salaries.append('Nothing_found')
          print('salary in else statement',salaries)
  return(salaries)
salary = extract_salary_from_result(soup)
print('salary is: ', salary)

当前输出为:

salary in if statement ['$115,000 a year']
salary in else statement ['$115,000 a year', 'Nothing_found']
salary is:  ['$115,000 a year', 'Nothing_found']

理想的输出应该是:

['$115,000 a year', 'Nothing_found','Nothing_found','Nothing_found','Nothing_found','Nothing_found','Nothing_found','Nothing_found','Nothing_found','Nothing_found'] 

1 个答案:

答案 0 :(得分:0)

您那里没有if-else。您有一个for-else。您可能想要这样:

   for c in salary:
       if len(c) > 0:
           salaries.append(c.text.strip())
           print('salary in if statement',salaries)
       else:
           salaries.append('Nothing_found')
           print('salary in else statement',salaries)

else中的for-else不能代替else中的if-else。它更像是一个“ finally”语句(如果没有遇到break语句,它将在循环结束后执行)。