我使用以下代码从网站检索经济数据:
from bs4 import BeautifulSoup
from selenium import webdriver
url = 'https://www.fxstreet.com/economic-calendar'
driver = webdriver.Chrome()
driver.get(url)
html = driver.page_source
soup = BeautifulSoup(html, 'lxml')
for tr in soup.findAll('tr',{'class':['fxst-tr-event', 'fxst-oddRow', 'fxit-eventrow', 'fxst-evenRow', 'fxs_cal_nextEvent']}):
event = tr.find('div', {'class': 'fxit-event-title'}).text
currency = tr.find('div', {'class': 'fxit-event-name'}).text
actual = tr.find('div', {'class': 'fxit-actual'}).text
forecast = tr.find('div', {'class': 'fxit-consensus'}).text
previous = tr.find('div', {'class': 'fxst-td-previous fxit-previous'}).text
time = tr.find('div', {'class': 'fxit-eventInfo-time fxs_event_time'}).text
volatility = tr.find('div', {'class': 'fxit-eventInfo-vol-c fxit-event-info-desktop'}).span['title']
print(u'\t{}\t{}\t{}\t{}').format(time, currency, event, volatility)
print语句的输出如下:
23:30
AUD
AiG Performance of Construction Index (Jul)
Moderate volatility expected
23:50
JPY
JP Foreign Reserves (Jul)
Low volatility expected
24h
CAD
August Civic Holiday
No volatility expected
01:30
AUD
ANZ Job Advertisements (Jun)
Low volatility expected
n/a
CNY
Foreign Exchange Reserves (MoM) (Jul)
Low volatility expected
05:00
JPY
Coincident Index (Jun)Preliminar
Moderate volatility expected
05:00
是否可以格式化此输出,使其打印成行,如下所示?
23:30 AUD AiG Performance of Construction Index (Jul) Moderate volatility expected
23:50 JPY JP Foreign Reserves (Jul) Low volatility expected
24h CAD August Civic Holiday No volatility expected
01:30 AUD ANZ Job Advertisements (Jun) Low volatility expected
n/a CNY Foreign Exchange Reserves (MoM) (Jul) Low volatility expected
05:00 JPY Coincident Index (Jun)Preliminary Moderate volatility expected
最终目标是剪切此输出并将其粘贴到Excel文件中。提前谢谢!
答案 0 :(得分:2)
尝试剥离这样的新行:
from bs4 import BeautifulSoup
from selenium import webdriver
url = 'https://www.fxstreet.com/economic-calendar'
driver = webdriver.Chrome()
driver.get(url)
html = driver.page_source
soup = BeautifulSoup(html, 'lxml')
for tr in soup.findAll('tr',{'class':['fxst-tr-event', 'fxst-oddRow', 'fxit-eventrow', 'fxst-evenRow', 'fxs_cal_nextEvent']}):
event = tr.find('div', {'class': 'fxit-event-title'}).text
currency = tr.find('div', {'class': 'fxit-event-name'}).text
actual = tr.find('div', {'class': 'fxit-actual'}).text
forecast = tr.find('div', {'class': 'fxit-consensus'}).text
previous = tr.find('div', {'class': 'fxst-td-previous fxit-previous'}).text
time = tr.find('div', {'class': 'fxit-eventInfo-time fxs_event_time'}).text
volatility = tr.find('div', {'class': 'fxit-eventInfo-vol-c fxit-event-info-desktop'}).span['title']
print(u'\t{}\t{}\t{}\t{}').format(time.strip(), currency.strip(), event.strip(), volatility.strip())
这样每个字符串都不会有换行符。
答案 1 :(得分:1)
为了补充其他答案,因为你提到"最终目标是削减此输出并将其粘贴到Excel文件中#34;您可能也有兴趣从数据中生成.csv
,因此在import csv
您需要将循环更改为:with open("data.csv", "w") as csv_file:
for tr in soup.findAll('tr',{'class':['fxst-tr-event', 'fxst-oddRow', 'fxit-eventrow', 'fxst-evenRow', 'fxs_cal_nextEvent']}):
event = tr.find('div', {'class': 'fxit-event-title'}).text
currency = tr.find('div', {'class': 'fxit-event-name'}).text
actual = tr.find('div', {'class': 'fxit-actual'}).text
forecast = tr.find('div', {'class': 'fxit-consensus'}).text
previous = tr.find('div', {'class': 'fxst-td-previous fxit-previous'}).text
time = tr.find('div', {'class': 'fxit-eventInfo-time fxs_event_time'}).text
volatility = tr.find('div', {'class': 'fxit-eventInfo-vol-c fxit-event-info-desktop'}).span['title']
line = [time.strip(),currency.strip(),event.strip(),volatility.strip()]
writer = csv.writer(csv_file, delimiter=',')
writer.writerow(line)
print(line)
后,可以轻松导出到Excel而不是复制粘贴:
var socket;
socket = io.connect("https://fakename.herokuapp.com");