我将所有标题和图像源链接都拼成一个文本文件,然后使用文本文件中的数据输出一个包含2列的html文件,一列用于图像,一列用于标题。如何显示可点击的图像,并显示标题和图像的2列格式?这就是我所拥有的
from bs4 import BeautifulSoup
titles = []
images = []
href = []
r = urllib.urlopen('https://www.open2study.com/courses').read()
soup = BeautifulSoup(r)
for i in soup.find_all('div', {'class': "courses_adblock_rollover"}):
titles.append(i.h2.text)
for i in soup.find_all('img', {'class': "image-style-course-logo-subjects-block"}):
images.append(i.get('src'))
with open('test.txt', "w") as f:
for i in zip(titles, images):
f.write(i[0].encode('ascii', 'ignore') + '\n'
+i[1].encode('ascii', 'ignore') +
'\n\n')
header = '<!doctyle html><html><head><title>My page</title></head><body>'
body = '<table><tr><td></td><td></td></tr>'
footer = '</table></body></html>'
with open('test.txt', 'r') as input, open('test.html', 'w') as output:
output.write(header)
output.write(body)
for line in input:
#ignore blank lines
if line == '\n':
continue
col1 = line.rstrip()
#read next line
col2 = next(input).rstrip()
output.write('<tr><td>{}</td><td><img src="{}" style="width: 160px; height: 100px"></td></tr>\n\n'.format(col1, col2))
output.write(footer)
答案 0 :(得分:0)
我觉得你自己真的很难做到这件事。首先从最大的元素开始,即整个过程div,然后稍后将信息从中拉出来更容易。
此代码为您提供第一列中的可点击图像和第二列中课程的标题。
from bs4 import BeautifulSoup
import urllib
base_url = 'https://www.open2study.com'
r = urllib.urlopen(base_url + '/courses').read()
soup = BeautifulSoup(r, "html.parser")
courses = soup.find_all('div', {'class': "courses_adblock_start"})
encoding = "utf-8"
page_title = 'Ai Truong'
html_template = '<!doctyle html><html><head><title>{}</title><meta charset="{}" /></head><body>{}</body></html>'
table_template = '<table>{}</table>'
table_row_template = '<tr><td>{}</td><td>{}</td></tr>'
img_template = '<a href="{}"><img src="{}" width="160px;" alt="{}"></a>'
table_rows = ''
for c in courses:
title = c.h2.text.encode(encoding)
image = c.find('img', {'class': 'image-style-course-logo-subjects-block'}).get('src')
href = c.parent.get('href')
img_tag = img_template.format(base_url + href, image, title)
table_rows += table_row_template.format(img_tag, title)
table_tag = table_template.format(table_rows)
with open('course-scrape.html', 'w') as html_out:
html_out.write(html_template.format(page_title, encoding, table_tag))