Question

我正在使用以下代码从以下链接中抓取链接/标题/图像并将其下载到设备（查看源：http://feeds.thisiscriminal.com/CriminalShow）。

我无法在每个情节中使用这些图像，因为我认为它们不是真正的图像，例如http://feeds.feedburner.com/~r/CriminalShow/~4/ENsi-bf5uC4。没有.gif等扩展名...

我正在使用以下方式从网站的另一部分抓取图像：

import requests
import pandas as pd
import urllib
import re

resp = requests.get("https://thisiscriminal.com/wp-json/criminal/v1/episodes?posts=1000000&page=1").json()
df = pd.DataFrame(resp['posts'], columns=['image'])
df['image'] = df['image'].apply(pd.Series)['medium'].replace({'"': '\'','""': '\'','"""': '\'' }, regex=True)
Regex_Pattern = r"([^\/]+$)"

for index, row in df.iterrows():
    match = re.findall(Regex_Pattern, row['image'])
    myfilename = ''.join(match)
    print(row['image'])
    print(myfilename)
    urllib.urlretrieve(row['image'], myfilename)

基本上，我的问题是，如何使用以下代码将上述本地输出文件组合成相对链接？

关于汤中的内容。find_all（）：

    thumbnail = content.find('image')
    thumbnail = thumbnail.get('src')

我假设它是/ output / folder /等，但是如何链接每个情节？查看输出，根本没有真正的编号方法，将每张图像保存在其自己的子文件夹中可能是审慎的做法，因此我只能引用该文件夹中的唯一图像？我现在正在大声考虑，但是我想...如果该奇怪的图像可以解决这个问题，那么它是否会出现在其他人身上？还是我的浏览器？

使用beautifulsoup / pandas抓取图像并插入相对链接

0 个答案: