Fetching Images from Wikipedia API except .svg extension

时间:2018-07-25 05:07:07

标签: php api wikipedia-api

I am trying to extract images from the Wikipedia API in my PHP page. But I am getting some unnecessary images in .svg extension. Is there a way i can avoid it .svg extension or just include .jpg extensions from the api request? I could see a variable called mediatype, but it was not working.

I am using the following api request url:

 https://en.wikipedia.org/w/api.php?&redirects=1&action=query&titles=Basilica%20Cistern&prop=images&format=json&imlimit=15

And the response i am getting is below:

{
  "continue": {
    "imcontinue": "1365761|Peacock-eyed_column_in_the_Basilica_Cistern_in_Istanbul,Turkey,January_20,2014.jpg ",
    "continue ": " || "
  },
  "query": {
    "pages": {
      "1365761": {
        "pageid": 1365761,
        "ns": 0,
        "title": "Basilica Cistern",
        "images": [{
            "ns": 6,
            "title": "File:20131203 Istanbul 269.jpg"
          },
          {
            "ns": 6,
            "title": "File:Archaeological site icon (red).svg"
          },
          {
            "ns": 6,
            "title": "File:Basilica Cistern.jpg"
          },
          {
            "ns": 6,
            "title": "File:Basilica Cistern Constantinople 2007.jpg"
          },
          {
            "ns": 6,
            "title": "File:Basilica Cistern Constantinople 2007 011.jpg"
          },
          {
            "ns": 6,
            "title": "File:Basilica cistern Art.jpg"
          },
          {
            "ns": 6,
            "title": "File:Carp at the Basilica Cistern, Istanbul 2007.JPG"
          },
          {
            "ns": 6,
            "title": "File:Commons-logo.svg"
          },
          {
            "ns": 6,
            "title": "File:Head of Medusa, Basilica Cistern, Constantinople 01.jpg"
          },
          {
            "ns": 6,
            "title": "File:Head of Medusa, Basilica Cistern, Constantinople 02.jpg"
          },
          {
            "ns": 6,
            "title": "File:Location map Istanbul.png"
          }
        ]
      }
    }
  }
}

PHP CODE:

function getResults($json){

$results = array();

$json_array = json_decode($json, true);

foreach($json_array['query']['pages'] as $page){
    if(count($page['images']) > 0){
        foreach($page['images'] as $image){

            $title = str_replace(" ", "_", $image["title"]);
            $imageinfourl = "https://en.wikipedia.org/w/api.php?&action=query&titles=".$title."&prop=imageinfo&iiprop=url&format=json";
            $imageinfo = curl($imageinfourl);
            $iamge_array = json_decode($imageinfo, true);
            $image_pages = $iamge_array["query"]["pages"];


            foreach($image_pages as $a){

                $results[] = $a["imageinfo"][0]["url"];
            }
        }
    }
}

return $results;

}

1 个答案:

答案 0 :(得分:0)

在API中看不到任何内容。我以为您也许可以使用imimages参数,但这仅对匹配整个title有用,例如

...&imimages=File%3A20131203%20Istanbul%20269.jpg

您可以做的就是过滤结果

// snip
if(count($page['images']) > 0) {
    $jpgs = array_filter($page['images'], function($img) {
        return strtolower(pathinfo($img['title'], PATHINFO_EXTENSION)) === 'jpg';
    });

    foreach($jpgs as $image) {
        // and continue

或者,只需在foreach循环中检查扩展名

foreach($page['images'] as $image) {
    if (strtolower(pathinfo($img['title'], PATHINFO_EXTENSION)) !== 'jpg') {
        continue;
    }