我正在尝试从包含当前利率的网站上刮掉表格。我将python和漂亮的汤一起使用,但是找不到html部分。请发送帮助!谢谢。
我只需要抓取当前的利率表,而不是其他所有内容,然后将其转换为CSV文件即可。这是我的网站的链接:https://www.global-rates.com/en/interest-rates/libor/american-dollar/usd-libor-interest-rate-12-months.aspx 这是当前利率表的图片:
我尝试过这样的事情:
import bs4
import requests
from bs4 import BeautifulSoup
import pandas as pd
URL = 'https://www.global-rates.com/en/interest-rates/libor/american-dollar/usd-libor-interest-rate-12-months.aspx'
response = requests.get(URL)
soup=bs4.BeautifulSoup(response.content, 'html.parser')
print(soup.title)
print(soup.title.string)
print(len(response.text))
table = soup.find('table', attrs = {'class':'tableheader'}).tbody
print(table)
columns = ['Current interest rates']
df = pd.DataFrame(columns = columns)
trs = table.find_all('tr')
for tr in trs:
tds = tr.find_all('td')
row = [td.text.replace('\n', '') for td in tds]
df = df.append(pd.Series(row, index = columns), ignore_index = True)
df.to_csv('libor.csv', index = False)
但这给了我属性错误:“无类型”对象没有属性“ tbody”
哦,如果可能的话,我也想自动取消星期一的利率。 谢谢您的帮助
答案 0 :(得分:1)
您可以使用此示例来抓取“当前利率”:
import requests
import pandas as pd
from bs4 import BeautifulSoup
url = 'https://www.global-rates.com/en/interest-rates/libor/american-dollar/usd-libor-interest-rate-12-months.aspx'
soup = BeautifulSoup(requests.get(url).content, 'html.parser')
all_data = []
for row in soup.select('table:has(td:contains("Current interest rates"))[style="width:208px;border:1px solid #CCCCCC;"] tr:not(:has([colspan]))'):
tds = [td.get_text(strip=True) for td in row.select('td')]
all_data.append(tds)
df = pd.DataFrame(all_data, columns=['Date', 'Rate'])
print(df)
df.to_csv('data.csv', index=False)
打印:
Date Rate
0 november 02 2020 0.33238 %
1 october 30 2020 0.33013 %
2 october 29 2020 0.33100 %
3 october 28 2020 0.32763 %
4 october 27 2020 0.33175 %
5 october 26 2020 0.33200 %
6 october 23 2020 0.33663 %
7 october 22 2020 0.33513 %
8 october 21 2020 0.33488 %
9 october 20 2020 0.33713 %
10 october 19 2020 0.33975 %
11 october 16 2020 0.33500 %
并保存data.csv
:
编辑:仅获取星期一,您可以使用数据框执行此操作:
df['Date'] = pd.to_datetime(df['Date'])
print(df[df['Date'].dt.weekday==0])
打印:
Date Rate
0 2020-11-02 0.33238 %
5 2020-10-26 0.33200 %
10 2020-10-19 0.33975 %
答案 1 :(得分:1)
这是我只用熊猫做的尝试
import pandas as pd
# Get all tables on page
dfs = pd.read_html('https://www.global-rates.com/en/interest-rates/libor/american-dollar/usd-libor-interest-rate-12-months.aspx')
# Find the Current interest rates table
df = [df for df in dfs if df.iloc[0][0] == 'Current interest rates'][0]
# Remove first row that contains column names
df = df.iloc[1:].copy()
# Set column names
df.columns = ['DATE','INTEREST_RATE']
# Convert date from november 02 2020 to 2020-11-02
df['DATE'] = pd.to_datetime(df['DATE'])
# Remove percentage sign from interest rate
df['INTEREST_RATE'] = df['INTEREST_RATE'].str.replace('%','').str.strip()
# Convert percentage to float type
df['INTEREST_RATE'] = df['INTEREST_RATE'].astype(float)
# Add day of the week column
df['DAY'] = df['DATE'].dt.day_name()
# Output all to CSV
df.to_csv('all_data.csv', index=False)
# Only Mondays
df_monday = df[df['DAY'] == 'Monday']
# Output only Mondays
df_monday.to_csv('monday_data.csv', index=False)
# Add day number of week (Monday = 0)
df['DAY_OF_WEEK_NUMBER'] = df['DATE'].dt.dayofweek
# Add week number of year
df['WEEK_OF_YEAR_NUMBER'] = df['DATE'].dt.weekofyear
# 1. Sort by week of year then day of week
# 2. Group by week of year
# 3. Select first record in group, which will be the earliest day available of that week
df_first_day_of_week = df.sort_values(['WEEK_OF_YEAR_NUMBER','DAY_OF_WEEK_NUMBER']).groupby('WEEK_OF_YEAR_NUMBER').first()
# # Output earliest day of the week data
df_first_day_of_week.to_csv('first_day_of_week.csv', index=False)