如何停止内循环并重申包括内循环在内的整个循环

时间:2018-12-15 15:15:10

标签: python loops beautifulsoup

我想使用beautifulsoup搜寻FAQ页面,但是在打印数据时遇到了一些问题。

例如:

问:问题1111

A:answer1111

Q:问题2222

A:answer2222

for q in question:
    print(q)
    for a in answer:
        print(a)

输出如下:

question1111
answer1111
answer2222
question2222
answer1111
answer2222

我想要的是这种方式:

question1111
answer1111
question2222
answer2222

然后我尝试使用break

for q in question:
    print(q)
    for a in answer:
        print(a)
        break

输出变为:

question1111
answer1111
question2222
answer1111

我尝试继续并通过,但仍无法正常工作

有什么方法可以运行一次内循环,然后返回到外循环重复吗?

添加到下方

html看起来像这样:

<div>
  <h4 class="mod-wysiwyg__small-heading">Question1</h4>
</div>
<div>
  <p class="mod-wysiwyg__text">Answer1... paragraph1</p>
</div>
<div>
  <p class="mod-wysiwyg__text">Answer1...paragraph2</p>
</div>
<div>
  <h4 class="mod-wysiwyg__small-heading">Question2</h4>
</div>
<div>
  <p class="mod-wysiwyg__text">Answer2</p>
</div>
    <div>
  <h4 class="mod-wysiwyg__small-heading">Question3</h4>
</div>

抓取html的代码:

if r.status_code == requests.codes.ok:
    soup = BeautifulSoup(r.text, 'html.parser')
    question = soup.find_all('h4', class_='mod-wysiwyg__small-heading')
    answer = soup.find_all('p', class_='mod-wysiwyg__text')

    for q, a in zip(question, answer):
        print("- - " + q.text[3:], file=open("output.txt",'a'))
        print("  - " + a.text, file=open("output.txt",'a'))

输出如下:

Question1
Answer1... paragraph1
Question2
Answer1...paragraph2
Question3
Answer2

2 个答案:

答案 0 :(得分:0)

遍历每个问题,然后遍历下一个兄弟姐妹,以收集答案的各个段落,直到遇到新问题为止(因为我们不想收集下一个问题的答案):

result = []

for question in soup.select("h4.mod-wysiwyg__small-heading"):
    paragraphs = []
    for sibling in question.parent.find_next_siblings("div"):
        if sibling.h4:  # new question, exit
            break

        answer = sibling.find('p', class_='mod-wysiwyg__text')
        if answer:
            paragraphs.append(answer.text)

    result.append((question.text, " ".join(paragraphs)))

示例HTML的输出:

[(u'Question1', u'Answer1... paragraph1 Answer1...paragraph2'),
 (u'Question2', u'Answer2'),
 (u'Question3', '')]

答案 1 :(得分:0)

如果每个答案和问题都没有包装在块div中,请转到.parent.find_next_sibling()

soup = BeautifulSoup(html, 'html.parser')
question = soup.find_all('h4', class_='mod-wysiwyg__small-heading')

for q in question:
  firstAnswer = q.parent.find_next_sibling('div').find('p') 
  # or
  # .find('p', class_="mod-wysiwyg__text")
  print(q.text)
  print(firstAnswer.text)