如何从beautifulsoup数据写入csv

时间:2018-06-09 20:58:46

标签: python csv beautifulsoup

想要获取我使用beautifulsoup提取的数据到.csv文件

这个提取的代码:

from requests import get

url = 'https://howlongtobeat.com/game.php?id=38050'

    response = get(url)

    from bs4 import BeautifulSoup

    html_soup = BeautifulSoup(response.text, 'html.parser')

    game_name = html_soup.select('div.profile_header')[0].text
    game_length = html_soup.select('div.game_times li div')[-1].text
    game_developer = html_soup.find_all('strong', string='\nDeveloper:\n')[0].next_sibling
    game_publisher = html_soup.find_all('strong', string='\nPublisher:\n')[0].next_sibling
    game_console = html_soup.find_all('strong', string='\nPlayable On:\n')[0].next_sibling
    game_genres = html_soup.find_all('strong', string='\nGenres:\n')[0].next_sibling

我想将这些结果写入csv(它正在提取我想要的信息,但我认为需要清理)

不确定如何写入csv或清理数据

请帮助

3 个答案:

答案 0 :(得分:0)

答案 1 :(得分:0)

您可以使用csv.writer

import csv, re
from bs4 import BeautifulSoup as soup
import requests
flag = False
with open('filename.csv', 'w') as f:
  write = csv.writer(f)
  for i in range(1, 30871):
    s = soup(requests.get(f'https://howlongtobeat.com/game.php?id={i}').text, 'html.parser')
    if not flag: #write header to file once
      write.writerow(['Name', 'Length']+[re.sub('[:\n]+', '', i.find('strong').text) for i in s.find_all('div', {'class':'profile_info'})])
      flag = True
    name = s.find('div', {"class":'profile_header shadow_text'}).text
    length = [[i.find('h5').text, i.find("div").text] for i in s.find_all('li', {'class':'time_100'})]
    stats = [re.sub('\n+[\w\s]+:\n+', '', i.text) for i in s.find_all('div', {'class':'profile_info'})]
    write.writerows([[name, length[0][-1]]+stats[:4]])

答案 2 :(得分:0)

将此数据写入CSV文件

var selected = {};
selected["tagId"] = 12;
selected["placementId"] = 27;
selected["locationId"] = null;

var users = [];
users.push({"name":"Joe", "tagIds":[3,4,12], "placementIds": [2,19]});
users.push({"name":"Suzy", "tagIds":[3,4], "placementIds": [2,19, 27]});
users.push({"name":"Amber", "tagIds":[1,12], "placementIds": [2,19, 27]});

function filterBySelected(items){
    var hits = [];
    console.log('within filter by Selected');
    console.log(items);
    for (var key in selected){
        if(selected[key]){ // 
            console.log('you are filtering by: ' + key);
        }
    }
    return hits;
}
var items = filterBySelected(users);
// items should return user object of "Amber"