我是python的新手。我开始制作一个脚本,用于使用Beautiful Soup
处理HTML文件。一切都可以正确处理,但是我现在想将文章保存在名为nowe
的新文件夹中,而不是打印它。处理后,我需要将所有文章放入同一文件夹中,或者制作一个CSV文件。
from bs4 import BeautifulSoup
import glob
import os, os.path
path = '/home/darek/Dokumenty/pliki/'
path_out = '/home/darek/Dokumenty/pliki/nowe'
for filename in glob.glob(os.path.join(path, '*.html',)):
f = filename
tresc = open(f)
soup = BeautifulSoup(tresc, 'html.parser')
article = soup.find('div',class_='post')
tagi = soup.find('div', class_='ph_social_share_box ph_social_share_box_bottom')
fout = open( +filename, "w")
fout.close()
print(article)
我的错误日志:
File "/home/darek/Dokumenty/parser.py", line 21, in <module>
fout = open( +filename, "w")
TypeError: bad operand type for unary +: 'str'
适用于印刷
从bs4导入BeautifulSoup 导入球 导入os,os.path
path = '/home/darek/Dokumenty/pliki/'
path_out = '/home/darek/Dokumenty/pliki/nowe'
for filename in glob.glob(os.path.join(path, '*.html',)):
f = filename
content = open(f)
soup = BeautifulSoup(content, 'html.parser')
article = soup.find('div',class_='post')
tags = soup.find('div', class_='ph_social_share_box ph_social_share_box_bottom')
print(article)
那行得通,我无法写入文件提示?
答案 0 :(得分:0)
在此行中删除“ +”:
fout = open( +filename, "w")
“ w”的意思是:“以写模式打开文件”。 如果在其中添加“ +”(例如“ w +”),则打开后它将从头开始写入文件。所以这行应该是
fout = open(filename, "w+")
答案 1 :(得分:0)
更改此代码块:
fout = open( +filename, "w")
fout.close()
要这样:
fout = open( filename, "w")
fout.write(article) # I assume here that article is what you want to be writing
fout.close()
tresc.close() # You never closed this, so it was a memory leak