将“na”文本添加到循环内的数组中

时间:2018-06-18 22:13:28

标签: python-3.x list beautifulsoup

我已经从抓取这个metacritc url获得了我想要的所有数据(见下文)但是,当我没有找到list的相关值(缺少值)时,我似乎无法输入值/ p>

我想拥有它所以所有列表都是均匀的(所以我可以直接使用.csv)

这是我到目前为止的代码:

from requests import get
from bs4 import BeautifulSoup
from urllib.request import Request, urlopen
import pandas as pd

#Define year
year_number = 2018

# Define the URL
i = range(0, 1)

names = []
metascores = []
userscores = []
userscoresNew = []
release_dates = []
release_datesNew = []
publishers = []
ratings = []
genres = []
genresNew = []




for element in i:

    url = "http://www.metacritic.com/browse/games/score/metascore/year/pc/filtered?view=detailed&sort=desc&year_selected=" + format(year_number)
    print(url)
    year_number -= 1
    # not sure about this but it works (I was getting blocked by something and this the way I found around it)
    req = Request(url, headers={'User-Agent': 'Mozilla/5.0'})

    web_byte = urlopen(req).read()

    webpage = web_byte.decode('utf-8')

    #this grabs the all the text from the page
    html_soup = BeautifulSoup(webpage, 'html5lib')

    #this is for selecting all the games in from 1 to 100 (the list of them)
    game_names = html_soup.find_all("div", class_="main_stats")
    game_metas = html_soup.find_all("a", class_="basic_stat product_score")  
    game_users = html_soup.find_all("li", class_='stat product_avguserscore')
    game_releases = html_soup.find_all("ul", class_='more_stats')
    game_publishers = html_soup.find_all("li", class_='stat publisher')
    game_ratings = html_soup.find_all("li", class_='stat maturity_rating')
    game_genres = html_soup.find_all("li", class_='stat genre')



    #Extract data from each game
    for games in game_names:
        name = games.find()
        names.append(name.text.strip())

    for games2 in game_metas:
        metascore = games2.find()
        metascores.append(metascore.text.strip())  

    for games3 in game_releases:
        release_date = games3.find()
        release_dates.append(release_date.text.strip())

    for games4 in game_users:
        userscore  = games4.find('span', class_="data textscore textscore_favorable") or games4.find('span', class_="data textscore textscore_mixed")
        if userscore:
            userscores.append(userscore.text)

    for games5 in game_publishers:
        publisher = games5.find("span", class_ = "data")
        if publisher:
            publishers.append(publisher.text)

    for games6 in game_ratings:
        rating = games6.find("span", class_ = "data")

    for games7 in game_genres:
        genre = games7.find("span", class_ = "data")
        if genre:
            genres.append(genre.text)


for x in release_dates:
    temp = str(x)
    temp2 = temp.replace("Release Date:\n                        ", "")
    release_datesNew.append(temp2)

for z in genres:
    temp3 = str(z)
    temp4 = temp3.strip()
    temp5 = temp4.replace("                                                            ", "")
    genresNew.append(temp5)

df = pd.DataFrame({'Games:': names})

不确定我将如何处理此代码

根据我的理解,它可以找到它可以找到的所有数据但是如果有空白则不知道它

有人可以为这种情况提供最佳解决方案

任何帮助都会很棒

由于

1 个答案:

答案 0 :(得分:0)

只需为现有条件添加其他内容......

if userscore:
        userscores.append(userscore.text)
else:
        userscores.append('na')