如何在一组特定文本后找到一个字符串?

时间:2014-04-03 03:35:14

标签: python regex

我正在尝试从下面的源代码中的_id字段后面捕获24个字符的字符串:

[{"actors":"Natalie Portman, Hugo Weaving, Stephen Rea","year":2006,"description":"","title":"V for Vendetta","image":"http:\/\/content8.flixster.com\/movie\/11\/16\/67\/11166734_det.jpg","rating":3.65,"_id":"4eb04794f5f8077d1d000000","links":{"rottentomatoes":"http:\/\/www.rottentomatoes.com\/m\/v_for_vendetta\/","imdb":"http:\/\/www.imdb.com\/title\/tt0434409\/","shortUrl":"http:\/\/www.canistream.it\/search\/movie\/4eb04794f5f8077d1d000000\/v-for-vendetta"}},{"actors":"Guy Madison, Monica Randall, Mariano Vidal Molina","year":1966,"description":"","title":"I Cinque della vendetta (Five for Revenge)(The Five Giants from Texas)(No Drums No Trumpets)","image":"http:\/\/images.rottentomatoescdn.com\/images\/redesign\/poster_default.gif","rating":-0.05,"_id":"4e663229f5f8071702000002","links":{"imdb":"http:\/\/www.imdb.com\/title\/tt0060238\/","rottentomatoes":"http:\/\/www.rottentomatoes.com\/m\/i-cinque-della-vendetta-five-for-revengethe-five-giants-from-texasno-drums-no-trumpets\/","shortUrl":"http:\/\/www.canistream.it\/search\/movie\/4e663229f5f8071702000002\/i-cinque-della-vendetta-five-for-revenge-the-five-giants-from-texas-no-drums-no-trumpets-"}}]

我尝试使用如下所示的lookbehind,但没有运气。

^(?<=_id":")[a-z0-9]{24}

我正在使用它作为Python脚本的一部分,如果它有所作为。

3 个答案:

答案 0 :(得分:1)

如果上述数据是存储在变量中的json对象,请说data

data[0]['_id'] 

给出你想要的东西。

如果是字符串,请使用python的json module将其加载为json并访问上述数据,即

import json
data_j = json.loads(data)
data_j[0]['_id'] 

答案 1 :(得分:1)

这是list,其中有一个dictionary,如果它被称为D

>>> D[0]['_id']
   '4eb04794f5f8077d1d000000'

答案 2 :(得分:1)

与其他两个答案一样,如果您有原始数据结构,请使用这些。但如果所有这些都失败了,这可能会奏效:

pat = '_id":"'
i = s.find(pat)
if i >= 0:
    i += len(pat)
value = s[i:i+24]