如何在python

时间:2017-01-26 22:54:56

标签: python json csv xpath lxml

我正在尝试将使用lxml和xpath解析的数据存储到CSV或JSON文件中。 我使用的是python3.6 这是我试过的

import requests
import csv
from lxml import html


headers = {"User-Agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.82 Safari/537.36"}
response = requests.get("https://www.capfriendly.com/teams/bruins", headers=headers).text
# Parse the body into a tree
parsed_body = html.fromstring(response)
# Perform xpaths on the tree
parsed_body.xpath('//tr/td//text()')
team_data = open('capData.csv', 'w')
csvwriter = csv.writer(team_data)
count = 0
for tbl in parsed_body.xpath('//table'):
    elements = tbl.xpath('.//tr/td//text()')
    for el in elements:
            if count == 0:
                count += 1
    csvwriter.writerow(elements)

我获得了一行中的所有数据

我做错了什么?

2 个答案:

答案 0 :(得分:1)

import requests, bs4

r = requests.get('https://www.capfriendly.com/teams/bruins')
soup = bs4.BeautifulSoup(r.text, 'lxml')
table = soup.find(id="team")
for tr in table('tr', class_=['odd', 'even']):  # get all tr whose class is odd or even
    row = [td.text for td in tr('td')]          # extract td's text
    print(row)

出:

['Krejci, David "A"', 'NMC', 'C', 'NHL', '30', '$7,250,000$7,250,000NMC', '$7,250,000$7,500,000NMC', '$7,250,000$7,500,000NMC', '$7,250,000$7,000,000Modified NTC', '$7,250,000$7,000,000Modified NTC', 'UFA', '']
['Bergeron, Patrice "A"', 'NMC', 'C', 'NHL', '31', '$6,875,000$8,750,000NMC', '$6,875,000$8,750,000NMC', '$6,875,000$6,875,000$6,000,000NMC', '$6,875,000$4,375,000$3,500,000NMC', '$6,875,000$4,375,000$1,000,000Modified NTC, NMC', '$6,875,000$4,375,000$1,000,000Modified NTC, NMC', 'UFA']
['Backes, David', 'NMC', 'C, RW', 'NHL', '32', '$6,000,000$8,000,000$3,000,000NMC', '$6,000,000$8,000,000$3,000,000NMC', '$6,000,000$6,000,000$3,000,000NMC', '$6,000,000$4,000,000$3,000,000Modified NTC', '$6,000,000$4,000,000$1,000,000Modified NTC', 'UFA', '']
['Marchand, Brad', 'M-NTC', 'LW', 'NHL', '28', '$4,500,000$5,000,000Modified NTC', '$6,125,000$8,000,000$4,000,000NMC', '$6,125,000$8,000,000$3,000,000NMC', '$6,125,000$7,500,000$4,000,000NMC', '$6,125,000$5,000,000$1,000,000NMC', '$6,125,000$6,500,000$4,000,000NMC', '$6,125,000$5,000,000$3,000,000Modified NTC']

答案 1 :(得分:0)

我能够帮助解决此问题,请参阅此问题How to avoid getting concatenated data in one cell