我有父div [col-md-8]的跟随数据,其中奇数div [row]包含Questions甚至div [row]包含Answers。总共我有15个问题。我有一个CSV文件,其中有字段是[问题,a,b,c,d]。我想要的是从html获取数据并将其保存为CSV文件。
<div class="col-md-8"> <!-- Parent Div Starts --!>
<div class="alert"></div>
<div class="row"> <!-- Question 1 Starts --!>
<div class=" col-md-8">
<strong>1</strong>
Every Polynomial has
</div>
</div><!-- Question 1 Ends --!>
<div class="row"> <!-- Question 1 Option Starts -- !>
<div class=" col-md-6">
(a) three
zeros
</div>
<div class=" col-md-6">
(b) three
zeros
</div>
<div class=" col-md-6">
(c) three
zeros
</div>
<div class=" col-md-6">
(d) three
zeros
</div>
</div><!-- Question 1 Option Ends -- !>
<div class="row"><!-- Question 2 Starts --!>
<div class=" col-md-8">
<strong>2</strong>
Every Equation has
</div>
</div><!-- Question 2 Ends --!>
<div class="row">!-- Question 2 Option Ends -- !>
<div class=" col-md-6">
(a) three
zeros
</div>
<div class=" col-md-6">
(b) three
zeros
</div>
<div class=" col-md-6">
(c) three
zeros
</div>
<div class=" col-md-6">
(d) three
zeros
</div>
</div><!-- Question 2 Option Ends -- !>
<!-- Like This I have 15 Questions and Options For Each Question -
-!>
</div> <!-- Parent Div Ends --!>
答案 0 :(得分:0)
你需要这个:
from bs4 import BeautifulSoup
import csv
soup = BeautifulSoup(response, 'html.parser')
all_div_row = soup.find_all('div',{'class':'row'}) # get all div whose class='row'
with open('question_answer.csv','wb') as f: # Change the 'wb' to 'w' mode in python 3.x
writer = csv.writer(f)
writer.writerow(['Questions','a','b','c','d']) #write header
for question, answer in zip(all_div_row[::2],all_div_row[1::2]): # get question in odd div, and answer in even div
question_text = [" ".join(question.text.strip().split())]
answer_text = [" ".join(div.text.strip().replace("\n", "").split()) for div in answer.find_all('div')]
writer.writerow(question_text+answer_text)
文件question_answer.csv
将是:
Questions,a,b,c,d
1 Every Polynomial has,(a) three zeros,(b) three zeros,(c) three zeros,(a) three zeros
2 Every Equation has,(a) three zeros,(b) three zeros,(c) three zeros,(a) three zeros