Python中的正则表达式不起作用

时间:2017-03-15 00:24:35

标签: python json regex parsing

我正在开发一个Python脚本,可以从Flickr和其他站点下载图像。我使用Flickr API来提取我正在尝试下载的图像的各种大小,并识别原始大小的URL。嗯,这就是我尝试做的事情。到目前为止,这是我的代码......

URL = {a Flickr link}

flickr = re.match(r".*flickr\.com\/photos\/([^\/]+)\/([0-9^\/]+)\/", URL)
URL = "https://api.flickr.com/services/rest/?method=flickr.photos.getSizes&api_key=6002c84e96ff95c1a861eafafa4284ba&photo_id=" + flickr.group(2) + "&format=json&nojsoncallback=1"

request = requests.get(URL)
result = request.text

parsed = re.match(r".\"Original\".*\"source\"\: \"([^\"]+)", result)
URL = parsed.group(1)

在我的代码中使用print()语句,我知道第一个正则表达式(用于解析原始Flickr URL以识别照片ID)正常工作,并且API请求正常工作,返回以下结果(使用示例网址https://www.flickr.com/photos/matbellphotography/33413612735/sizes/h/)...

{ "sizes": { "canblog": 0, "canprint": 0, "candownload": 1, 
"size": [
  { "label": "Square", "width": 75, "height": 75, "source": "https:\/\/farm3.staticflickr.com\/2855\/33413612735_645397d6a5_s.jpg", "url": "https:\/\/www.flickr.com\/photos\/matbellphotography\/33413612735\/sizes\/sq\/", "media": "photo" },
  { "label": "Large Square", "width": "150", "height": "150", "source": "https:\/\/farm3.staticflickr.com\/2855\/33413612735_645397d6a5_q.jpg", "url": "https:\/\/www.flickr.com\/photos\/matbellphotography\/33413612735\/sizes\/q\/", "media": "photo" },
  { "label": "Thumbnail", "width": 100, "height": 67, "source": "https:\/\/farm3.staticflickr.com\/2855\/33413612735_645397d6a5_t.jpg", "url": "https:\/\/www.flickr.com\/photos\/matbellphotography\/33413612735\/sizes\/t\/", "media": "photo" },
  { "label": "Small", "width": "240", "height": "160", "source": "https:\/\/farm3.staticflickr.com\/2855\/33413612735_645397d6a5_m.jpg", "url": "https:\/\/www.flickr.com\/photos\/matbellphotography\/33413612735\/sizes\/s\/", "media": "photo" },
  { "label": "Small 320", "width": "320", "height": "213", "source": "https:\/\/farm3.staticflickr.com\/2855\/33413612735_645397d6a5_n.jpg", "url": "https:\/\/www.flickr.com\/photos\/matbellphotography\/33413612735\/sizes\/n\/", "media": "photo" },
  { "label": "Medium", "width": "500", "height": "333", "source": "https:\/\/farm3.staticflickr.com\/2855\/33413612735_645397d6a5.jpg", "url": "https:\/\/www.flickr.com\/photos\/matbellphotography\/33413612735\/sizes\/m\/", "media": "photo" },
  { "label": "Medium 640", "width": "640", "height": "427", "source": "https:\/\/farm3.staticflickr.com\/2855\/33413612735_645397d6a5_z.jpg", "url": "https:\/\/www.flickr.com\/photos\/matbellphotography\/33413612735\/sizes\/z\/", "media": "photo" },
  { "label": "Medium 800", "width": "800", "height": "534", "source": "https:\/\/farm3.staticflickr.com\/2855\/33413612735_645397d6a5_c.jpg", "url": "https:\/\/www.flickr.com\/photos\/matbellphotography\/33413612735\/sizes\/c\/", "media": "photo" },
  { "label": "Large", "width": "1024", "height": "683", "source": "https:\/\/farm3.staticflickr.com\/2855\/33413612735_645397d6a5_b.jpg", "url": "https:\/\/www.flickr.com\/photos\/matbellphotography\/33413612735\/sizes\/l\/", "media": "photo" },
  { "label": "Large 1600", "width": "1600", "height": "1067", "source": "https:\/\/farm3.staticflickr.com\/2855\/33413612735_4d92e2f70d_h.jpg", "url": "https:\/\/www.flickr.com\/photos\/matbellphotography\/33413612735\/sizes\/h\/", "media": "photo" },
  { "label": "Large 2048", "width": "2048", "height": "1365", "source": "https:\/\/farm3.staticflickr.com\/2855\/33413612735_81441ed1da_k.jpg", "url": "https:\/\/www.flickr.com\/photos\/matbellphotography\/33413612735\/sizes\/k\/", "media": "photo" },
  { "label": "Original", "width": "5760", "height": "3840", "source": "https:\/\/farm3.staticflickr.com\/2855\/33413612735_34cbc172c1_o.jpg", "url": "https:\/\/www.flickr.com\/photos\/matbellphotography\/33413612735\/sizes\/o\/", "media": "photo" }
] }, "stat": "ok" }

我的代码显然在此之后崩溃了。第二个正则表达式,用于标识图像在其原始文件大小的下载URL,显然没有找到任何匹配项。根据另一个print()声明...

parsed.group(1) = none

我使用RegExr设置表达式,它确切地从JSON结果中确定了我需要的内容。我做错了什么?

1 个答案:

答案 0 :(得分:3)

也许您的requests.Response对象具有可以直接访问的json属性。如果没有,只需import json,解析您的request.content并使用返回的字典。例如:

>>> import json
>>> json_response = """
... { "sizes": { "canblog": 0, "canprint": 0, "candownload": 1, 
... "size": [
...   { "label": "Square", "width": 75, "height": 75, "source": "https:\/\/farm3.staticflickr.com\/2855\/33413612735_645397d6a5_s.jpg", "url": "https:\/\/www.flickr.com\/photos\/matbellphotography\/33413612735\/sizes\/sq\/", "media": "photo" },
...   { "label": "Large Square", "width": "150", "height": "150", "source": "https:\/\/farm3.staticflickr.com\/2855\/33413612735_645397d6a5_q.jpg", "url": "https:\/\/www.flickr.com\/photos\/matbellphotography\/33413612735\/sizes\/q\/", "media": "photo" },
...   { "label": "Thumbnail", "width": 100, "height": 67, "source": "https:\/\/farm3.staticflickr.com\/2855\/33413612735_645397d6a5_t.jpg", "url": "https:\/\/www.flickr.com\/photos\/matbellphotography\/33413612735\/sizes\/t\/", "media": "photo" },
...   { "label": "Small", "width": "240", "height": "160", "source": "https:\/\/farm3.staticflickr.com\/2855\/33413612735_645397d6a5_m.jpg", "url": "https:\/\/www.flickr.com\/photos\/matbellphotography\/33413612735\/sizes\/s\/", "media": "photo" },
...   { "label": "Small 320", "width": "320", "height": "213", "source": "https:\/\/farm3.staticflickr.com\/2855\/33413612735_645397d6a5_n.jpg", "url": "https:\/\/www.flickr.com\/photos\/matbellphotography\/33413612735\/sizes\/n\/", "media": "photo" },
...   { "label": "Medium", "width": "500", "height": "333", "source": "https:\/\/farm3.staticflickr.com\/2855\/33413612735_645397d6a5.jpg", "url": "https:\/\/www.flickr.com\/photos\/matbellphotography\/33413612735\/sizes\/m\/", "media": "photo" },
...   { "label": "Medium 640", "width": "640", "height": "427", "source": "https:\/\/farm3.staticflickr.com\/2855\/33413612735_645397d6a5_z.jpg", "url": "https:\/\/www.flickr.com\/photos\/matbellphotography\/33413612735\/sizes\/z\/", "media": "photo" },
...   { "label": "Medium 800", "width": "800", "height": "534", "source": "https:\/\/farm3.staticflickr.com\/2855\/33413612735_645397d6a5_c.jpg", "url": "https:\/\/www.flickr.com\/photos\/matbellphotography\/33413612735\/sizes\/c\/", "media": "photo" },
...   { "label": "Large", "width": "1024", "height": "683", "source": "https:\/\/farm3.staticflickr.com\/2855\/33413612735_645397d6a5_b.jpg", "url": "https:\/\/www.flickr.com\/photos\/matbellphotography\/33413612735\/sizes\/l\/", "media": "photo" },
...   { "label": "Large 1600", "width": "1600", "height": "1067", "source": "https:\/\/farm3.staticflickr.com\/2855\/33413612735_4d92e2f70d_h.jpg", "url": "https:\/\/www.flickr.com\/photos\/matbellphotography\/33413612735\/sizes\/h\/", "media": "photo" },
...   { "label": "Large 2048", "width": "2048", "height": "1365", "source": "https:\/\/farm3.staticflickr.com\/2855\/33413612735_81441ed1da_k.jpg", "url": "https:\/\/www.flickr.com\/photos\/matbellphotography\/33413612735\/sizes\/k\/", "media": "photo" },
...   { "label": "Original", "width": "5760", "height": "3840", "source": "https:\/\/farm3.staticflickr.com\/2855\/33413612735_34cbc172c1_o.jpg", "url": "https:\/\/www.flickr.com\/photos\/matbellphotography\/33413612735\/sizes\/o\/", "media": "photo" }
... ] }, "stat": "ok" }"""
>>> 
>>> json_parsed = json.loads(json_response)
>>> for img in json_parsed["sizes"]["size"]:
...     print img.get("source")
... 
https://farm3.staticflickr.com/2855/33413612735_645397d6a5_s.jpg
https://farm3.staticflickr.com/2855/33413612735_645397d6a5_q.jpg
https://farm3.staticflickr.com/2855/33413612735_645397d6a5_t.jpg
https://farm3.staticflickr.com/2855/33413612735_645397d6a5_m.jpg
https://farm3.staticflickr.com/2855/33413612735_645397d6a5_n.jpg
https://farm3.staticflickr.com/2855/33413612735_645397d6a5.jpg
https://farm3.staticflickr.com/2855/33413612735_645397d6a5_z.jpg
https://farm3.staticflickr.com/2855/33413612735_645397d6a5_c.jpg
https://farm3.staticflickr.com/2855/33413612735_645397d6a5_b.jpg
https://farm3.staticflickr.com/2855/33413612735_4d92e2f70d_h.jpg
https://farm3.staticflickr.com/2855/33413612735_81441ed1da_k.jpg
https://farm3.staticflickr.com/2855/33413612735_34cbc172c1_o.jpg
>>>