我正在尝试访问page上文本文件中的内容。由于每个文本文件都有不同的URL,所以我无法在python中生成URL并使用Pandas抓取内容。因此,我试图为此使用API。当执行用户令牌时,我得到的是这样的:
{
"jwt": "eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJzdWIiOjU5MDR9.b9elxkmNj0kmWxDPjal0_mLY9UPg7enoT7Cdg7gN1d0"
}
现在,我不确定如何使用它来访问我上面提到的第一页上的所有文本文件。有人可以指导我如何前进吗?
答案 0 :(得分:0)
此脚本将从第1页转到最后一页,并选择所有以.txt
结尾的链接:
import requests
from bs4 import BeautifulSoup
from pprint import pprint
base_url = 'https://usda.library.cornell.edu'
url = 'https://usda.library.cornell.edu/concern/publications/c821gj76b?locale=en&page=1#release-items'
soup = BeautifulSoup(requests.get(url).text, 'html.parser')
page = 1
while True:
print('Page no.{}...'.format(page))
print('-' * 80)
txt_urls = [a["href"] for a in soup.select('#release-items a[href$=".txt"]')]
pprint(txt_urls)
m = soup.select_one('a[rel="next"][href]')
if m and m['href'] != '#':
soup = BeautifulSoup(requests.get(base_url + m['href']).text, 'html.parser')
page += 1
else:
break
打印:
Page no.1...
--------------------------------------------------------------------------------
['https://downloads.usda.library.cornell.edu/usda-esmis/files/c821gj76b/kd17d5288/ms35tm800/agpr0719.txt',
'https://downloads.usda.library.cornell.edu/usda-esmis/files/c821gj76b/r494vw17c/q524jz702/agpr0619.txt',
'https://downloads.usda.library.cornell.edu/usda-esmis/files/c821gj76b/bc386t90p/vx021r07n/agpr0519.txt',
'https://downloads.usda.library.cornell.edu/usda-esmis/files/c821gj76b/3484zr667/4j03d7561/agpr0419.txt',
'https://downloads.usda.library.cornell.edu/usda-esmis/files/c821gj76b/f7623m42k/qf85nk40w/agpr0319.txt',
'https://downloads.usda.library.cornell.edu/usda-esmis/files/c821gj76b/7w62fg32b/n009w815n/agpr0219.txt',
'https://downloads.usda.library.cornell.edu/usda-esmis/files/c821gj76b/kk91fs55d/z890s0860/agpr0219.txt',
'https://downloads.usda.library.cornell.edu/usda-esmis/files/c821gj76b/t435gj88z/8910k0903/agpr0119.txt',
'https://downloads.usda.library.cornell.edu/usda-esmis/files/c821gj76b/m613n410w/41687p68x/01-30-19_Report_Reschedule_ASB_Notice_Final.txt',
'https://downloads.usda.library.cornell.edu/usda-esmis/files/c821gj76b/st74cv012/0z709086s/agpr1118.txt']
Page no.2...
--------------------------------------------------------------------------------
['https://downloads.usda.library.cornell.edu/usda-esmis/files/c821gj76b/5q47rs05x/m900nx65x/agpr1018.txt',
'https://downloads.usda.library.cornell.edu/usda-esmis/files/c821gj76b/4b29b953w/m900nx64n/agpr0918.txt',
'https://downloads.usda.library.cornell.edu/usda-esmis/files/c821gj76b/5h73px257/1c18dh137/AgriPric-08-29-2018.txt',
'https://downloads.usda.library.cornell.edu/usda-esmis/files/c821gj76b/t722hb16b/76537257b/AgriPric-07-30-2018.txt',
'https://downloads.usda.library.cornell.edu/usda-esmis/files/c821gj76b/pz50gx32d/qb98mg88k/AgriPric-06-28-2018.txt',
'https://downloads.usda.library.cornell.edu/usda-esmis/files/c821gj76b/vd66w115f/p2676w80r/AgriPric-05-30-2018.txt',
'https://downloads.usda.library.cornell.edu/usda-esmis/files/c821gj76b/9c67wp20r/bc386k622/AgriPric-04-27-2018.txt',
'https://downloads.usda.library.cornell.edu/usda-esmis/files/c821gj76b/r494vm201/h128ng14d/AgriPric-03-28-2018.txt',
'https://downloads.usda.library.cornell.edu/usda-esmis/files/c821gj76b/z316q273n/37720f04c/AgriPric-02-27-2018_correction.txt',
'https://downloads.usda.library.cornell.edu/usda-esmis/files/c821gj76b/5d86p1433/zp38wd92f/AgriPric-01-30-2018.txt']
...and so on.
您可以使用以下链接下载文本文件,例如:
txt_data = requests.get('https://downloads.usda.library.cornell.edu/usda-esmis/files/c821gj76b/kd17d5288/ms35tm800/agpr0719.txt').text
print(txt_data)
打印(但您可以将其保存到文件中而不是打印到屏幕上):
Agricultural Prices
ISSN: 1937-4216
Released July 31, 2019, by the National Agricultural Statistics Service
(NASS), Agricultural Statistics Board, United States Department of
Agriculture (USDA).
June Prices Received Index Up 1.0 Percent
...etc.