我正在尝试让我的服务器在python 3中从URL抓取文件。具体来说,我想将URL传递给函数,我希望该函数捕获音频文件(格式多种多样)并将其保存为MP3,可能使用ffmpeg或ffmpy。如果URL也有PDF,我也想将其另存为PDF。我还没有对PDF进行大量研究,但是我一直在研究音频片段,并且不确定是否可行。
我在这里看了几个问题,但最值得注意的是; How do I download a file over HTTP using Python?
它有点旧,但是我在那里尝试了几种方法,总是遇到一些问题。我已经尝试过使用请求库,urllib,streamripper,也许还有另一个。
是否可以通过推荐的库做到这一点?
例如,我尝试过的大多数方法确实保存了某些内容,例如html页面,或在这种情况下称为“ file.mp3”的空文件。
Streamripper收到尝试更改用户代理错误的消息。
我不确定这是否可行,但是我确定这里有些我不理解的地方,有人可以指出正确的方向吗?
这不一定是我要使用的代码,只是我使用的某些示例不起作用。
import requests
url = "http://someurl.com/webcast/something"
r = requests.get(url)
with open('file.mp3', 'wb') as f:
f.write(r.content)
# Retrieve HTTP meta-data
print(r.status_code)
print(r.headers['content-type'])
print(r.encoding)
**编辑
import requests
import ffmpy
import datetime
import os
## THIS SCRIPT CAN BE PASSED A URL AND IF THE URL RETURNS
## HTTP HEADER FOR CONTENT TYPE AUDIO/MPEG, THE FILE WILL
## BE SAVED AS THE CURRENT-DATE-AND-TIME.MP3
##
## THIS SCRIPT CAN BE PASSED A URL AND IF THE URL RETURNS
## HTTP HEADER FOR CONTENT TYPE application/pdf, THE FILE WILL
## BE SAVED AS THE CURRENT-DATE-AND-TIME.PDF
##
## THIS SCRIPT CAN BE PASSED A URL AND IF THE URL RETURNS
## HTTP HEADER FOR CONTENT TYPE other than application/pdf, OR
## audio/mpeg, THE FILE WILL NOT BE SAVED
def BordersPythonDownloader(url):
print('Beginning file download requests')
r = requests.get(url, stream=True)
contype = r.headers['content-type']
if contype == "audio/mpeg":
print("audio file")
filename = '[{}].mp3'.format(str(datetime.datetime.now()))
with open('file.mp3', 'wb+') as f:
f.write(r.content)
ff = ffmpy.FFmpeg(
inputs={'file.mp3': None},
outputs={filename: None}
)
ff.run()
if os.path.exists('file.mp3'):
os.remove('file.mp3')
elif contype == "application/pdf":
print("pdf file")
filename = '[{}].pdf'.format(str(datetime.datetime.now()))
with open(filename, 'wb+') as f:
f.write(r.content)
else:
print("URL DID NOT RETURN AN AUDIO OR PDF FILE, IT RETURNED {}".format(contype))
# INSERT YOUR URL FOR TESTING
# OR CALL THIS SCRIPT FROM ELSEWHERE, PASSING IT THE URL
#DEFINE YOUR URL
#url = 'http://archive.org/download/testmp3testfile/mpthreetest.mp3'
#CALL THE SCRIPT; PASSING IT YOUR URL
#x = BordersPythonDownloader(url)
#ANOTHER EXAMPLE WITH A PDF
#url = 'https://www.cisco.com/c/en/us/td/docs/switches/lan/catalyst6500/ios/12-2SY/configuration/guide/sy_swcg/etherchannel.pdf'
#x = BordersPythonDownloader(url)
感谢Richard,此代码可以正常工作,并且可以帮助我更好地理解它。对改进上述工作示例有何建议?