如何使用BeautifulSoup将数据保存到csv文件?

时间:2017-11-10 06:22:24

标签: python beautifulsoup

我有父div [col-md-8]的跟随数据,其中奇数div [row]包含Questions甚至div [row]包含Answers。总共我有15个问题。我有一个CSV文件,其中有字段是[问题,a,b,c,d]。我想要的是从html获取数据并将其保存为CSV文件。

<div class="col-md-8"> <!-- Parent Div Starts --!>
    <div class="alert"></div>
    <div class="row">  <!-- Question 1 Starts --!>
       <div class=" col-md-8">
       <strong>1</strong>
        Every Polynomial has
       </div>
   </div><!-- Question 1 Ends --!>
   <div class="row"> <!-- Question 1 Option Starts -- !>
       <div class=" col-md-6">
           &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;(a) three 
           zeros
       </div>
       <div class=" col-md-6">
           &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;(b) three 
           zeros
       </div>
       <div class=" col-md-6">
           &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;(c) three 
           zeros
       </div>
       <div class=" col-md-6">
           &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;(d) three 
           zeros
       </div>         
  </div><!-- Question 1 Option Ends -- !>
  <div class="row"><!-- Question 2 Starts --!>
      <div class=" col-md-8">
        <strong>2</strong>
         Every Equation has
      </div>
  </div><!-- Question 2 Ends --!>
    <div class="row">!-- Question 2 Option Ends -- !>
       <div class=" col-md-6">
           &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;(a) three 
           zeros
       </div>
       <div class=" col-md-6">
           &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;(b) three 
           zeros
       </div>
       <div class=" col-md-6">
           &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;(c) three 
           zeros
       </div>
       <div class=" col-md-6">
           &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;(d) three 
           zeros
       </div>         
  </div><!-- Question 2 Option Ends -- !>
  <!-- Like This I have 15 Questions and Options For Each Question -
  -!> 
</div> <!-- Parent Div Ends --!>

1 个答案:

答案 0 :(得分:0)

你需要这个:

from bs4 import BeautifulSoup
import csv
soup = BeautifulSoup(response, 'html.parser')
all_div_row = soup.find_all('div',{'class':'row'}) # get all div whose class='row'
with open('question_answer.csv','wb') as f: # Change the 'wb' to 'w' mode in python 3.x
    writer = csv.writer(f)
    writer.writerow(['Questions','a','b','c','d']) #write header
    for question, answer in zip(all_div_row[::2],all_div_row[1::2]): # get question in odd div, and answer in even div
        question_text = [" ".join(question.text.strip().split())]
        answer_text = [" ".join(div.text.strip().replace("\n", "").split()) for div in answer.find_all('div')]
        writer.writerow(question_text+answer_text) 

文件question_answer.csv将是:

Questions,a,b,c,d
1 Every Polynomial has,(a) three zeros,(b) three zeros,(c) three zeros,(a) three zeros
2 Every Equation has,(a) three zeros,(b) three zeros,(c) three zeros,(a) three zeros