删除空格和换行符 - beautifulsoup python

时间:2015-09-08 21:49:06

标签: python python-2.7 beautifulsoup

使用Beautifulsoup,我正在抓取以下网络资源:

<div>
<p class="introduction">    Manchester City&#039;s Fabian Delph limped off in     the first minute of England Euro 2016 qualifier against Switzerland with a suspected hamstring injury. </p>
<p>    The 25-year-old midfielder, who signed for City from Aston Villa in the summer, pulled up suddenly during Tuesday&#039;s game at Wembley. </p>
<p>    Delph was picked in Roy Hodgson&#039;s first XI having been left out of the starting line-up against San Marino on Saturday.</p>
<p>    Delph was making his eighth appearance for England.</p>
</div>

我使用以下代码:

for item in soup.find_all('div'):
    print item.find('p').text.replace('\n','')

这样可行,但结果看起来像这样(更像是四个不同的值):

Manchester City's Fabian Delph limped off in the first minute of England's Euro 2016 qualifier against Switzerland with a suspected hamstring injury.

The 25-year-old midfielder, who signed for City from Aston Villa in the summer, pulled up suddenly during Tuesday's game at Wembley.

Delph was picked in Roy Hodgson's first XI having been left out of the starting line-up against San Marino on Saturday.

Delph was making his eighth appearance for England.

如何以下列格式获取输出(更像是单个值):

Manchester City's Fabian Delph limped off in the first minute of England's Euro 2016 qualifier against Switzerland with a suspected hamstring injury. The 25-year-old midfielder, who signed for City from Aston Villa in the summer, pulled up suddenly during Tuesday's game at Wembley. Delph was picked in Roy Hodgson's first XI having been left out of the starting line-up against San Marino on Saturday. Delph was making his eighth appearance for England.

最终,我想将这些数据保存在csv文件中。上述内容应被视为csv文件中的单个值(不是四个值)。

2 个答案:

答案 0 :(得分:0)

您正在做的是调用打印功能。 print只是将字符串打印到控制台然后打印换行符。你可以有一个像下面这样的大字符串

big_string = ""
for item in soup.find_all('div'):
  big_string += item.find('p').text.replace('\n','')

答案 1 :(得分:0)

您正在调用print语句四次,因此它显示在四行上。

尝试此修改

single_string_answer = ''
for item in soup.find_all('div'): 
    item.find('p').text.replace('\n','')
    single_string_answer += str(item)
print single_string_answer