Question

我的正则表达式是：

genres\":\[(?=.*name\":\"(.*?)\"}(?=.*\"homepage))

我的目标是：

{
    "adult":false,
    "backdrop_path":"/b9OVFl48ZV2oTLzACSwBpNrCUhJ.jpg",
    "belongs_to_collection": {
        "id":135468,
        "name":"G.I. Joe (Live-Action Series)",
        "poster_path":"/5LtZM6zLB2TDbdIaOC5uafjYZY1.jpg",
        "backdrop_path":"/m3ip0ci0TnX0ATUxpweqElYCeq4.jpg"
    },
    "budget":185000000,
    "genres":[
        {
            "id":28,
            "name":"Action"
        },
        {
            "id":12,
            "name":"Adventure"
        },
        {
            "id":878,
            "name":"Science Fiction"
        },
        {
            "id":53,
            "name":"Thriller"
        }
    ],
    "homepage":"http://www.gijoemovie.com",
    "id":72559,
    "imdb_id":"tt1583421",
    "original_title":"G.I. Joe: Retaliation",
    "overview":"Framed for crimes against the country, the G.I. Joe team is terminated by Presidential order. This forces the G.I. Joes into not only fighting their mortal enemy Cobra; they are forced to contend with threats from within the government that jeopardize their very existence.",
    "popularity":11.7818680433822,
    "poster_path":"/swk1AHwPvIJv8NUFM1qpFuaT642.jpg",
    "production_companies":[
        {
            "name":"Paramount Pictures",
            "id":4
        },
        {
            "name":"Metro-Goldwyn-Mayer (MGM)",
            "id":8411
            }
    ],
    "production_countries":[
        {
            "iso_3166_1":"US",
            "name":"United States of America"
        }
    ],
    "release_date":"2013-03-29",
    "revenue":371876278,
    "runtime":110,
    "spoken_languages":[
        {
            "iso_639_1":"en",
            "name":"English"
        }
    ],
    "status":"Released",
    "tagline":"GI JOE IS NO MORE",
    "title":"G.I. Joe: Retaliation",
    "vote_average":5.4,
    "vote_count":1806
}

我知道它是JSON，我应该使用JSON类或者比Regex更好的东西来处理它，但是，在这个项目中我只限于Regex。

我正在使用http://regexhero.net/tester/测试我的正则表达式，而我只得到Thriller，当我得到Action, Adventure, Science Fiction, Thriller时，所有这些都是。{/ p>

PS：我正在使用Java和java.util.regex

List<String> generos = new ArrayList<>();

Matcher filter = Pattern.compile("genres\":\\[(?=.*name\":\"(.*?)\"}(?=.*\"homepage))").matcher(response);

        while (filter.find()) {
            generos.add(filter.group(1));
        }

代码完全正常，唯一的问题是在正则表达式中。只需在任何Regex Tester中试用这个正则表达式，你会发现它只会发生最后一次，但我需要所有这些。

Answer 1

这似乎有效：

(?<!^)(?:genres|\G)[^]]*?"name":"(.*?)"

\G基本匹配上一个匹配结束的位置（如果尚未匹配，则为字符串的开头）。 [docs]

因此，由于\G可以匹配字符串的开头（但我们不希望这样），首先要确保我们不在带有负向后看的字符串的开头(?<!^)

然后，找到“流派”或\G（之前匹配的地方），然后开始寻找“名称”。 [^]]*?中的量词与?一起变得懒惰，因此它会在找到第一个“名称”时停止，而不是贪婪地继续，直到它通过其他人并且只找到最后一个。

您想要的文本将在第1组中捕获。

Answer 2

在regexhero中测试：

(?<=genres[^]]{1,200})\"name\":\"[^"]+\"

[^]]将确保您保持在流派阵列中。

Answer 3

首先，尝试使用正则表达式解析像JSON这样的非常规格式是一个糟糕的主意。我不知道为什么你的老师会要求你这样做，除非他/她想让你发现不使用正则表达式的方法很难......

也就是说，你不能用一个正则表达式做到这一点，至少如果genres的数量并不总是固定的话，你不可能这样做。

您可以分两步完成：

首先，将genres列表与以下正则表达式匹配：

Pattern regex = Pattern.compile("\"genres\":\\[[^\\[\\]]*\\]");

然后在前一个正则表达式的匹配结果上使用此正则表达式：

Pattern regex = Pattern.compile("\"name\":\"([^\"]*)\"");

（从每场比赛的.group(1)得到结果）。

正则表达式仅匹配最后一次出现

3 个答案: