如何格式化Beautiful Soup和Selenium的输出?

时间:2017-08-07 02:06:01

标签: python selenium beautifulsoup

我使用以下代码从网站检索经济数据:

from bs4 import BeautifulSoup
from selenium import webdriver

url = 'https://www.fxstreet.com/economic-calendar'

driver = webdriver.Chrome()
driver.get(url)
html = driver.page_source
soup = BeautifulSoup(html, 'lxml')

for tr in soup.findAll('tr',{'class':['fxst-tr-event', 'fxst-oddRow', 'fxit-eventrow', 'fxst-evenRow', 'fxs_cal_nextEvent']}):
    event = tr.find('div', {'class': 'fxit-event-title'}).text
    currency = tr.find('div', {'class': 'fxit-event-name'}).text
    actual = tr.find('div', {'class': 'fxit-actual'}).text
    forecast = tr.find('div', {'class': 'fxit-consensus'}).text
    previous = tr.find('div', {'class': 'fxst-td-previous fxit-previous'}).text
    time = tr.find('div', {'class': 'fxit-eventInfo-time fxs_event_time'}).text
    volatility = tr.find('div', {'class': 'fxit-eventInfo-vol-c fxit-event-info-desktop'}).span['title']

    print(u'\t{}\t{}\t{}\t{}').format(time, currency, event, volatility)

print语句的输出如下:

23:30   
AUD                                     
AiG Performance of Construction Index (Jul)
    Moderate volatility expected
    23:50   
JPY                                     
JP Foreign Reserves (Jul)
    Low volatility expected
    24h 
CAD                                     
August Civic Holiday
    No volatility expected
    01:30   
AUD                                     
ANZ Job Advertisements (Jun)
    Low volatility expected
    n/a 
CNY                                     
Foreign Exchange Reserves (MoM) (Jul)
    Low volatility expected
    05:00   
JPY                                     
Coincident Index (Jun)Preliminar
    Moderate volatility expected
    05:00

是否可以格式化此输出,使其打印成行,如下所示?

    23:30   AUD   AiG Performance of Construction Index (Jul)   Moderate volatility expected
    23:50   JPY   JP Foreign Reserves (Jul)                     Low volatility expected
    24h     CAD   August Civic Holiday                          No volatility expected
    01:30   AUD   ANZ Job Advertisements (Jun)                  Low volatility expected
    n/a     CNY   Foreign Exchange Reserves (MoM) (Jul)         Low volatility expected
    05:00   JPY   Coincident Index (Jun)Preliminary             Moderate volatility expected

最终目标是剪切此输出并将其粘贴到Excel文件中。提前谢谢!

2 个答案:

答案 0 :(得分:2)

尝试剥离这样的新行:

from bs4 import BeautifulSoup
from selenium import webdriver

url = 'https://www.fxstreet.com/economic-calendar'

driver = webdriver.Chrome()
driver.get(url)
html = driver.page_source
soup = BeautifulSoup(html, 'lxml')

for tr in soup.findAll('tr',{'class':['fxst-tr-event', 'fxst-oddRow', 'fxit-eventrow', 'fxst-evenRow', 'fxs_cal_nextEvent']}):
    event = tr.find('div', {'class': 'fxit-event-title'}).text
    currency = tr.find('div', {'class': 'fxit-event-name'}).text
    actual = tr.find('div', {'class': 'fxit-actual'}).text
    forecast = tr.find('div', {'class': 'fxit-consensus'}).text
    previous = tr.find('div', {'class': 'fxst-td-previous fxit-previous'}).text
    time = tr.find('div', {'class': 'fxit-eventInfo-time fxs_event_time'}).text
    volatility = tr.find('div', {'class': 'fxit-eventInfo-vol-c fxit-event-info-desktop'}).span['title']

    print(u'\t{}\t{}\t{}\t{}').format(time.strip(), currency.strip(), event.strip(), volatility.strip()) 

这样每个字符串都不会有换行符。

答案 1 :(得分:1)

为了补充其他答案,因为你提到"最终目标是削减此输出并将其粘贴到Excel文件中#34;您可能也有兴趣从数据中生成.csv,因此在import csv您需要将循环更改为:with open("data.csv", "w") as csv_file: for tr in soup.findAll('tr',{'class':['fxst-tr-event', 'fxst-oddRow', 'fxit-eventrow', 'fxst-evenRow', 'fxs_cal_nextEvent']}): event = tr.find('div', {'class': 'fxit-event-title'}).text currency = tr.find('div', {'class': 'fxit-event-name'}).text actual = tr.find('div', {'class': 'fxit-actual'}).text forecast = tr.find('div', {'class': 'fxit-consensus'}).text previous = tr.find('div', {'class': 'fxst-td-previous fxit-previous'}).text time = tr.find('div', {'class': 'fxit-eventInfo-time fxs_event_time'}).text volatility = tr.find('div', {'class': 'fxit-eventInfo-vol-c fxit-event-info-desktop'}).span['title'] line = [time.strip(),currency.strip(),event.strip(),volatility.strip()] writer = csv.writer(csv_file, delimiter=',') writer.writerow(line) print(line) 后,可以轻松导出到Excel而不是复制粘贴:

var socket;
socket = io.connect("https://fakename.herokuapp.com");