我已经编写了此代码来替换带有标题的网址。它确实根据需要用标题替换url,但它会在下一行打印它们的标题。
twfile.txt包含以下行:
link1 http://t.co/HvKkwR1c
no link line
输出tw2file:
link1
Instagram
no link line
但我希望以这种形式输出:
link1 Instagram
no link line
我该怎么办?
我的代码:
from bs4 import BeautifulSoup
import urllib
output = open('tw2file.txt','w')
with open('twfile.txt','r') as inputf:
for line in inputf:
try:
list1 = line.split(' ')
for i in range(len(list1)):
if "http" in list1[i]:
##print list1[i]
response = urllib.urlopen(list1[i])
html = response.read()
soup = BeautifulSoup(html)
list1[i] = soup.html.head.title
##print list1[i]
list1[i] = ''.join(ch for ch in list1[i])
else:
list1[i] = ''.join(ch for ch in list1[i])
line = ' '.join(list1)
print line
output.write(line)
except:
pass
inputf.close()
output.close()
答案 0 :(得分:1)
关于写入文件的内容
fileobject = open("bar", 'w' )
fileobject.write("Hello, World\n") # newline is inserted by '\n'
fileobject.close()
关于控制台输出
将print line
更改为print line,
Python编写' \ n'最后的字符,除非print语句以逗号结尾。
答案 1 :(得分:1)
试试这段代码:(见这里,这里和这里)
from bs4 import BeautifulSoup
import urllib
with open('twfile.txt','r') as inputf, open('tw2file.txt','w') as output:
for line in inputf:
try:
list1 = line.split(' ')
for i in range(len(list1)):
if "http" in list1[i]:
response = urllib.urlopen(list1[i])
html = response.read()
soup = BeautifulSoup(html)
list1[i] = soup.html.head.title
list1[i] = ''.join(ch for ch in list1[i]).strip() # here
else:
list1[i] = ''.join(ch for ch in list1[i]).strip() # here
line = ' '.join(list1)
print line
output.write('{}\n'.format(line)) # here
except:
pass
顺便说一句,您使用的是Python 2.7.x +
,在同一个open
子句中表达了两个with
。他们的close
也是不必要的。