Question

如何针对未知数量的列，将该列表的每次迭代添加到csv文件中。

这是因为类型列表和每部电影的长度都不相同。

如果电影只有少于最大值，那么我希望其他列为空。

我希望输出看起来像下面这样；

WebPage,Film,Genre1,Genre2,Genre3, ..... maxnumberofGenres
https://www.imdb.com/title/tt6644200/, A Quiet Place, Drama, Horror, Sci-Fi

我该如何解决问题？

import requests
from googlesearch import search 
import csv
import pandas
from bs4 import BeautifulSoup
import numpy as np
import os
from datetime import datetime
import time


start_time = time.time()

colnames = ['title']
data = pandas.read_csv('D:/Desktop/webScrapeMovieInfo/mediaDataForGenreScrape2.csv', names=colnames, header=None)
my_list = data["title"]
my_list = list(my_list)
my_list = my_list[1:]
length = len(my_list)
for film in my_list:
    query = film + " imdb"
    for j in search(query, tld="co.in", num=10, stop=1, pause=2):
        print(j)
        page = requests.get(j)
        response = page.status_code
        if response == 200:
            soup = BeautifulSoup(page.content, "lxml")
            genreData = soup.find_all("div",{"class":"subtext"})
            filmtitle = soup.find("h1")
            filmtitle = filmtitle.contents[0]
            print(filmtitle)
            links = []
            for h in genreData:
                a = h.find_all('a')
                aLength = len(a) - 1
                a1 = a[0]
                for b in range(0,aLength):
                    print(a[b].string)



np.savetxt("filmWebPages.csv", j, delimiter=",", fmt='%s', header="imdbPageOfFilms")


print("--- %s seconds ---" % (time.time() - start_time))

Answer 1

要提取所有类型，您可以使用此脚本-该脚本会将其保存到CSV并打印到屏幕上：

import csv
import requests
from bs4 import BeautifulSoup

url = 'https://www.imdb.com/search/title/?pf_rd_i=moviemeter&genres=action&explore=title_type,genres'

soup = BeautifulSoup(requests.get(url).text, 'lxml')

rows = []
for h3, genres in zip(soup.select('.lister-item-header'), soup.select('.lister-item-header ~ p .genre')):
    title = h3.select_one('a').text
    url = h3.select_one('a')['href']
    genres = [*map(str.strip, genres.text.split(', '))]
    rows.append([title, url, genres])

#find all the genres we have:
all_genres = sorted(list(set(sum((row[2] for row in rows), []))))

#transform all rows to include True/False if they belong to certain genre
for row in rows:
    row[2] = [g in row[2] for g in all_genres]

#print header
print('{: <40}{: ^20}'.format('Name', 'URL') +  ''.join('{: ^10}'.format(g) for g in all_genres))

#print all rows
for title, url, genres in rows:
    print('{: <40}{: <20}'.format(title, url), end='')
    print(''.join('{: ^10}'.format('X' if g else '-') for g in genres))

#save to csv
with open('data.csv', 'w', newline='') as csvfile:
    csvwriter = csv.writer(csvfile, delimiter=',',
                            quotechar='"', quoting=csv.QUOTE_MINIMAL)
    csvwriter.writerow(['Name', 'URL'] + all_genres)
    for title, url, genres in rows:
        csvwriter.writerow([title, url, *['✔' if g else '' for g in genres]])

打印：

Name                                            URL           Action  Adventure Animation   Comedy    Crime     Drama    Fantasy   Mystery    Sci-Fi   Thriller 
Spider-Man: Far from Home               /title/tt6320628/       X         X         -         -         -         -         -         -         X         -     
Top Gun: Maverick                       /title/tt1745960/       X         -         -         -         -         X         -         -         -         -     
The King's Man                          /title/tt6856242/       X         X         -         X         -         -         -         -         -         -     
La Casa de Papel                        /title/tt6468322/       X         -         -         -         X         -         -         X         -         -     
Troonide mäng                           /title/tt0944947/       X         X         -         -         -         X         -         -         -         -     
Crawl                                   /title/tt8364368/       X         X         -         -         -         X         -         -         -         -     
Alita: Sõjaingel                        /title/tt0437086/       X         X         -         -         -         -         -         -         X         -     
Tasujad: Lõppmäng                       /title/tt4154796/       X         X         -         -         -         -         -         -         X         -     
Terminaator: Tume Saatus                /title/tt6450804/       X         X         -         -         -         -         -         -         X         -     
The Witcher                             /title/tt5180504/       X         X         -         -         -         X         -         -         -         -     
Hellboy                                 /title/tt2274648/       X         X         -         -         -         -         X         -         -         -     
Point Blank                             /title/tt2499472/       X         -         -         -         -         -         -         -         -         X     
Shazam!                                 /title/tt0448115/       X         X         -         X         -         -         -         -         -         -     
Stuber                                  /title/tt7734218/       X         -         -         X         X         -         -         -         -         -     
Fast & Furious Presents: Hobbs & Shaw   /title/tt6806448/       X         X         -         -         -         -         -         -         -         -     
Tippkutt                                /title/tt0092099/       X         -         -         -         -         X         -         -         -         -     
John Wick 3: Parabellum                 /title/tt6146586/       X         -         -         -         X         -         -         -         -         X     
Ämblikmees: Uus universum               /title/tt4633694/       X         X         X         -         -         -         -         -         -         -     
S.H.I.E.L.D.i agendid                   /title/tt2364582/       X         X         -         -         -         X         -         -         -         -     
The Boys                                /title/tt1190634/       X         -         -         X         X         -         -         -         -         -     
Designated Survivor                     /title/tt5296406/       X         -         -         -         -         X         -         X         -         -     
Kapten Marvel                           /title/tt4154664/       X         X         -         -         -         -         -         -         X         -     
Viikingid                               /title/tt2306299/       X         X         -         -         -         X         -         -         -         -     
Mulan                                   /title/tt4566758/       X         X         -         -         -         X         -         -         -         -     
Bond 25                                 /title/tt2382320/       X         X         -         -         -         -         -         -         -         X     
Spider-Man: Homecoming                  /title/tt2250912/       X         X         -         -         -         -         -         -         X         -     
Murder Mystery                          /title/tt1618434/       X         -         -         X         X         -         -         -         -         -     
Pandora                                 /title/tt10207090/      X         -         -         -         -         X         -         -         X         -     
Shaft                                   /title/tt4463894/       X         -         -         X         X         -         -         -         -         -     
Jessica Jones                           /title/tt2357547/       X         -         -         -         X         X         -         -         -         -     
Star Wars: The Rise of Skywalker        /title/tt2527338/       X         X         -         -         -         -         X         -         -         -     
Leegion                                 /title/tt5114356/       X         -         -         -         -         X         -         -         X         -     
Anna                                    /title/tt7456310/       X         -         -         -         -         -         -         -         -         X     
Vibukütt                                /title/tt2193021/       X         X         -         -         X         -         -         -         -         -     
NCIS: Kriminalistid                     /title/tt0364845/       X         -         -         -         X         X         -         -         -         -     
Välk                                    /title/tt3107288/       X         X         -         -         -         X         -         -         -         -     
Wonder Woman 1984                       /title/tt7126948/       X         X         -         -         -         -         X         -         -         -     
Titans                                  /title/tt1043813/       X         X         -         -         -         X         -         -         -         -     
Ghostbusters 2020                       /title/tt4513678/       X         -         -         X         X         -         -         -         -         -     
Power Rangers                           /title/tt3717490/       X         X         -         -         -         -         -         -         X         -     
Charlie's Angels                        /title/tt5033998/       X         X         -         X         -         -         -         -         -         -     
Mehed mustas: globaalne oht             /title/tt2283336/       X         X         -         X         -         -         -         -         -         -     
Swamp Thing                             /title/tt8362852/       X         X         -         -         -         X         -         -         -         -     
Queen of the South                      /title/tt1064899/       X         -         -         -         X         X         -         -         -         -     
Tasujad: Igaviku sõda                   /title/tt4154756/       X         X         -         -         -         -         -         -         X         -     
Gotham                                  /title/tt3749900/       X         -         -         -         X         X         -         -         -         -     
Godzilla: King of the Monsters          /title/tt3741700/       X         X         -         -         -         -         X         -         -         -     
Shingeki no kyojin                      /title/tt2560140/       X         X         X         -         -         -         -         -         -         -     
Escape Plan: The Extractors             /title/tt6772804/       X         -         -         -         X         -         -         -         -         X     
Thor: Ragnarök                          /title/tt3501632/       X         X         -         X         -         -         -         -         -         -

并保存data.csv。这是LibreOffice的屏幕截图：

如何将多个列表导出到一个csv？

1 个答案: