Python:用标题替换url

时间:2014-04-28 08:06:56

标签: python url python-2.7 beautifulsoup urllib

我已经编写了此代码来替换带有标题的网址。它确实根据需要用标题替换url,但它会在下一行打印它们的标题。

twfile.txt包含以下行:

link1 http://t.co/HvKkwR1c
no link line

输出tw2file:

link1
Instagram
no link line

但我希望以这种形式输出:

link1 Instagram
no link line

我该怎么办?

我的代码:

from bs4 import BeautifulSoup
import urllib

output = open('tw2file.txt','w')

with open('twfile.txt','r') as inputf:
    for line in inputf:
        try:
            list1 = line.split(' ')
            for i in range(len(list1)):

                if "http" in list1[i]:
                    ##print list1[i]
                    response = urllib.urlopen(list1[i])
                    html = response.read()
                    soup = BeautifulSoup(html)
                    list1[i] = soup.html.head.title
                    ##print list1[i]


                    list1[i] = ''.join(ch for ch in list1[i])
                else:
                    list1[i] = ''.join(ch for ch in list1[i])
            line = ' '.join(list1)
            print line
            output.write(line)
        except:
            pass


inputf.close()
output.close()

2 个答案:

答案 0 :(得分:1)

关于写入文件的内容

fileobject = open("bar", 'w' )
fileobject.write("Hello, World\n") # newline is inserted by '\n'
fileobject.close()

关于控制台输出

print line更改为print line,

Python编写' \ n'最后的字符,除非print语句以逗号结尾。

答案 1 :(得分:1)

试试这段代码:(见这里,这里和这里)

from bs4 import BeautifulSoup
import urllib

with open('twfile.txt','r') as inputf, open('tw2file.txt','w') as output:
    for line in inputf:
        try:
            list1 = line.split(' ')
            for i in range(len(list1)):
                if "http" in list1[i]:
                    response = urllib.urlopen(list1[i])
                    html = response.read()
                    soup = BeautifulSoup(html)
                    list1[i] = soup.html.head.title
                    list1[i] = ''.join(ch for ch in list1[i]).strip() # here
                else:
                    list1[i] = ''.join(ch for ch in list1[i]).strip() # here
            line = ' '.join(list1)
            print line
            output.write('{}\n'.format(line))  # here
        except:
            pass

顺便说一句,您使用的是Python 2.7.x +,在同一个open子句中表达了两个with。他们的close也是不必要的。