Python:如何在打印到html时保持读取txt文件的中断

时间:2012-02-20 22:08:39

标签: python text line-breaks

当我将内容打印成HTML文件时,我正试图保留从txt文件中读取的换行符。

我以这种方式从samppipe得到结果:

class BottomPipeResult :

    AGENT_ID   = "Mozilla/5.0 (X11; Linux x86_64; rv:7.0.1) Gecko/20100101 Firefox/7.0.1"

    BOTTOMPIPE_URL = "http://boilerpipe-web.appspot.com/extract?url={0}&extractor=LargestContentExtractor&output=text"

    #BOTTOMPIPE_URL = "http://boilerpipe-web.appspot.com/extract?url={0}&extractor=ArticleExtractor&output=htmlFragment"

    _myBPPage = ""

    # scrape and get results from bottompipe
    def scrapeResult(self, theURL, user_agent=AGENT_ID) :
        request = urllib2.Request(self.BOTTOMPIPE_URL.format(theURL))
        if user_agent:
            request.add_header("User-Agent", user_agent)
            pagefile = urllib2.urlopen(request)
            realurl = pagefile.geturl()
            f = pagefile
            self._myBPPAge = f.read()
        return(self._myBPPAge) 

但是当我将它们重新打印为html时,我放弃了所有的换行符。

这是我用来写入HTML的代码

f = open('./../../entries-new.html', 'a')
f.write(BottomPipeResult.scrapeResult(myLinkResult))
f.close()

这里是一个booilerpipe文本结果的例子:

http://boilerpipe-web.appspot.com/extract?url=http%3A%2F%2Fresult.com&extractor=ArticleExtractor&output=text

我尝试了这个,但它不起作用:

myLinkResult = re.sub('\n','<br />', myLinkResult)

有什么建议吗?

由于

2 个答案:

答案 0 :(得分:1)

您可以将文字换成&lt; pre&gt;标签。这告诉HTML文本是预先格式化的。

例如:

<pre>Your text
With line feeds
and other things
</pre>

答案 1 :(得分:0)

我修改了你的代码只是一个触摸,所以它是可以运行的,它似乎对我“正常”工作。结果输出具有预期的行结尾。我看到一些编码问题,但没有行结束问题。

代码

import urllib2

class BottomPipeResult :

    AGENT_ID   = "Mozilla/5.0 (X11; Linux x86_64; rv:7.0.1) Gecko/20100101 Firefox/7.0.1"
    BOTTOMPIPE_URL = "http://boilerpipe-web.appspot.com/extract?url={0}&extractor=LargestContentExtractor&output=text"
    _myBPPage = ""

    # scrape and get results from bottompipe
    def scrapeResult(self, theURL, user_agent=AGENT_ID) :
        request = urllib2.Request(self.BOTTOMPIPE_URL.format(theURL))
        if user_agent:
            request.add_header("User-Agent", user_agent)
            pagefile = urllib2.urlopen(request)
            realurl = pagefile.geturl()
            f = pagefile
            self._myBPPAge = f.read()
        return(self._myBPPAge) 


bpr = BottomPipeResult()
myLinkResult = 'http://result.com'

f = open('out.html', 'a')
f.write(bpr.scrapeResult(myLinkResult))
f.close()

输出

Result-Expand.flv
We want to help your company grow. Our Result offices around the world can help you expand your business faster and more cost efficiently. And at the same time bring the experience of having expanded more than 150 companies during the past ten years.
Result can help you grow in your local market, regionally, or globally through our team of experienced business builders, our industry know-how and our know-who.
Our services range from well designed expansion strategies to assuming operational responsibility for turning these strategies into successful business.
We don’t see ourselves as mere consultants  who give you a strategy presentation and then leave you to your own devices. We prefer to be considered as an extended, entirely practical arm of your management team. We’re hands-on and heads-on. We’re business builders.
We’re co-entrepreneurs. This is also reflected in our compensation structure – a significant part of our compensation is result  based.

使结果更像“HTML”,如

就html输出而言,您可能希望将每一行包装在<p>段落标记中。

output = BottomPipeResult.scrapeResult(myLinkResult) 
f.write('\n'.join(['<p>' + x + '</p>' for x in output.split('\n')]))