Python / Beautifulsoup-缺少表的第一行

时间:2018-08-25 02:44:34

标签: python beautifulsoup

有人可以帮助我理解为什么以下代码的输出缺少表的第一行吗?我是python的新手,并且不是因为缺少尝试而无法自己解决问题。

import requests
import csv
from collections import OrderedDict
from bs4 import BeautifulSoup as soup
from urllib.request import urlopen as uReq


def printfunction():
    with open("C:/Users/.../audusd.csv", 'a', newline='') as f:
        wr = csv.writer(f)
        wr.writerows([(data[0], data[1], data[2], data[3], data[4], data[5])])


url = requests.get("https://au.investing.com/currencies/aud-usd-historical-data/",
                   headers={'User-Agent': 'Mozilla/5.0'})

od = OrderedDict()
content_page = soup(url.content, 'html.parser')
table = content_page.find('table', {'class': 'genTbl closedTbl historicalTbl'})
cols = [th.text for th in table.select("th")[1:]]

for row in table.select("tr + tr"):
 data = [td.text for td in row.select("td")]
 printfunction()
 print(data)

因此出现输出:

['Aug 23, 2018', '0.7246', '0.7349', '0.7355', '0.7240', '-1.37%']
['Aug 22, 2018', '0.7347', '0.7370', '0.7371', '0.7332', '-0.33%']
['Aug 21, 2018', '0.7371', '0.7341', '0.7383', '0.7332', '0.42%']
['Aug 20, 2018', '0.7340', '0.7306', '0.7344', '0.7294', '0.44%']
['Aug 19, 2018', '0.7308', '0.7316', '0.7317', '0.7308', '-0.05%']
['Aug 17, 2018', '0.7312', '0.7261', '0.7321', '0.7253', '0.70%']
['Aug 16, 2018', '0.7261', '0.7240', '0.7288', '0.7222', '0.30%']
['Aug 15, 2018', '0.7239', '0.7247', '0.7249', '0.7202', '-0.08%']
['Aug 14, 2018', '0.7245', '0.7270', '0.7284', '0.7222', '-0.33%']
['Aug 13, 2018', '0.7269', '0.7289', '0.7300', '0.7248', '-0.25%']
['Aug 12, 2018', '0.7287', '0.7278', '0.7300', '0.7273', '-0.21%']
['Aug 10, 2018', '0.7302', '0.7372', '0.7381', '0.7279', '-0.95%']
['Aug 09, 2018', '0.7372', '0.7435', '0.7456', '0.7371', '-0.81%']
['Aug 08, 2018', '0.7432', '0.7420', '0.7440', '0.7382', '0.15%']
['Aug 07, 2018', '0.7421', '0.7386', '0.7440', '0.7379', '0.46%']
['Aug 06, 2018', '0.7387', '0.7398', '0.7406', '0.7372', '-0.09%']
['Aug 05, 2018', '0.7394', '0.7397', '0.7400', '0.7394', '-0.08%']
['Aug 03, 2018', '0.7400', '0.7359', '0.7412', '0.7346', '0.54%']
['Aug 02, 2018', '0.7360', '0.7405', '0.7413', '0.7354', '-0.59%']
['Aug 01, 2018', '0.7404', '0.7427', '0.7430', '0.7389', '-0.34%']
['Jul 31, 2018', '0.7429', '0.7408', '0.7442', '0.7402', '0.30%']
['Jul 30, 2018', '0.7407', '0.7390', '0.7416', '0.7387', '0.12%']
['Jul 29, 2018', '0.7398', '0.7400', '0.7406', '0.7398', '-0.04%']
['Jul 27, 2018', '0.7401', '0.7377', '0.7416', '0.7369', '0.31%']
['Jul 26, 2018', '0.7378', '0.7456', '0.7464', '0.7370', '-1.03%']
['Jul 25, 2018', '0.7455', '0.7424', '0.7466', '0.7391', '0.46%']

所需的输出(根据源表):

['Aug 24, 2018', 'x', 'x', 'x', 'x', 'x']
['Aug 23, 2018', '0.7246', '0.7349', '0.7355', '0.7240', '-1.37%']
['Aug 22, 2018', '0.7347', '0.7370', '0.7371', '0.7332', '-0.33%']
['Aug 21, 2018', '0.7371', '0.7341', '0.7383', '0.7332', '0.42%']
['Aug 20, 2018', '0.7340', '0.7306', '0.7344', '0.7294', '0.44%']
['Aug 19, 2018', '0.7308', '0.7316', '0.7317', '0.7308', '-0.05%']
['Aug 17, 2018', '0.7312', '0.7261', '0.7321', '0.7253', '0.70%']
['Aug 16, 2018', '0.7261', '0.7240', '0.7288', '0.7222', '0.30%']
['Aug 15, 2018', '0.7239', '0.7247', '0.7249', '0.7202', '-0.08%']
['Aug 14, 2018', '0.7245', '0.7270', '0.7284', '0.7222', '-0.33%']
['Aug 13, 2018', '0.7269', '0.7289', '0.7300', '0.7248', '-0.25%']
['Aug 12, 2018', '0.7287', '0.7278', '0.7300', '0.7273', '-0.21%']
['Aug 10, 2018', '0.7302', '0.7372', '0.7381', '0.7279', '-0.95%']
['Aug 09, 2018', '0.7372', '0.7435', '0.7456', '0.7371', '-0.81%']
['Aug 08, 2018', '0.7432', '0.7420', '0.7440', '0.7382', '0.15%']
['Aug 07, 2018', '0.7421', '0.7386', '0.7440', '0.7379', '0.46%']
['Aug 06, 2018', '0.7387', '0.7398', '0.7406', '0.7372', '-0.09%']
['Aug 05, 2018', '0.7394', '0.7397', '0.7400', '0.7394', '-0.08%']
['Aug 03, 2018', '0.7400', '0.7359', '0.7412', '0.7346', '0.54%']
['Aug 02, 2018', '0.7360', '0.7405', '0.7413', '0.7354', '-0.59%']
['Aug 01, 2018', '0.7404', '0.7427', '0.7430', '0.7389', '-0.34%']
['Jul 31, 2018', '0.7429', '0.7408', '0.7442', '0.7402', '0.30%']
['Jul 30, 2018', '0.7407', '0.7390', '0.7416', '0.7387', '0.12%']
['Jul 29, 2018', '0.7398', '0.7400', '0.7406', '0.7398', '-0.04%']
['Jul 27, 2018', '0.7401', '0.7377', '0.7416', '0.7369', '0.31%']
['Jul 26, 2018', '0.7378', '0.7456', '0.7464', '0.7370', '-1.03%']
['Jul 25, 2018', '0.7455', '0.7424', '0.7466', '0.7391', '0.46%']

非常感谢! OM。

1 个答案:

答案 0 :(得分:1)

选择器tr + tr的意思是“ tr之后的tr”。因此,第一行不会显示,因为您特别要求它不显示。如果要选择所有行,只需选择普通tr

如果您不知道选择器的工作原理,只是从其他似乎很接近的代码中复制了这些内容,请阅读the docs

如果您由于tr内有一个th而要执行此操作,而您想跳过该行,则不是这样做的方法。

您可以尝试为每个tr之前或之后的tr提出一个复杂的选择器(希望您永远不会遇到单行表) …), 或类似的东西。

但是,更简单地说,只需选择tr内的每个tbody

for row in table.select('tbody tr'):

…或直接在内部:

for row in table.select('tbody > tr'):

或者只需选择表内tbody内的所有行:

for row in table.tbody.select('tr'):