在python中以2列显示图像和标题

时间:2015-11-10 22:21:57

标签: python html beautifulsoup

我将所有标题和图像源链接都拼成一个文本文件,然后使用文本文件中的数据输出一个包含2列的html文件,一列用于图像,一列用于标题。如何显示可点击的图像,并显示标题和图像的2列格式?这就是我所拥有的

from bs4 import BeautifulSoup

titles = []
images = []
href  = []

r = urllib.urlopen('https://www.open2study.com/courses').read()
soup = BeautifulSoup(r)

for i in soup.find_all('div', {'class': "courses_adblock_rollover"}):
    titles.append(i.h2.text)

for i in soup.find_all('img', {'class': "image-style-course-logo-subjects-block"}):
    images.append(i.get('src'))



with open('test.txt', "w") as f:
    for i in zip(titles, images):
        f.write(i[0].encode('ascii', 'ignore') + '\n'
                +i[1].encode('ascii', 'ignore') +
                '\n\n')

header = '<!doctyle html><html><head><title>My page</title></head><body>'
body = '<table><tr><td></td><td></td></tr>'

footer = '</table></body></html>'


with open('test.txt', 'r') as input, open('test.html', 'w') as output:
   output.write(header)
   output.write(body)

   for line in input:
    #ignore blank lines
       if line == '\n':
            continue

       col1 = line.rstrip()
       #read next line
       col2 = next(input).rstrip()
       output.write('<tr><td>{}</td><td><img src="{}" style="width: 160px;             height: 100px"></td></tr>\n\n'.format(col1, col2))
       output.write(footer)

1 个答案:

答案 0 :(得分:0)

我觉得你自己真的很难做到这件事。首先从最大的元素开始,即整个过程div,然后稍后将信息从中拉出来更容易。

此代码为您提供第一列中的可点击图像和第二列中课程的标题。

from bs4 import BeautifulSoup
import urllib

base_url = 'https://www.open2study.com'
r = urllib.urlopen(base_url + '/courses').read()

soup = BeautifulSoup(r, "html.parser")

courses = soup.find_all('div', {'class': "courses_adblock_start"})

encoding = "utf-8"
page_title = 'Ai Truong'
html_template = '<!doctyle html><html><head><title>{}</title><meta charset="{}" /></head><body>{}</body></html>'
table_template = '<table>{}</table>'
table_row_template = '<tr><td>{}</td><td>{}</td></tr>'
img_template = '<a href="{}"><img src="{}" width="160px;" alt="{}"></a>'

table_rows = ''
for c in courses:
    title = c.h2.text.encode(encoding)
    image = c.find('img', {'class': 'image-style-course-logo-subjects-block'}).get('src')
    href = c.parent.get('href')
    img_tag = img_template.format(base_url + href, image, title)
    table_rows += table_row_template.format(img_tag, title)

table_tag = table_template.format(table_rows)

with open('course-scrape.html', 'w') as html_out:
    html_out.write(html_template.format(page_title, encoding, table_tag))

输出

Snapshot