这是一个文件的一行,我只想在 uri 之后的网址和 smallPictureUrl 之后的网址稍后使用它但我找不到一个正确的方式
星号代表文字或数字或两者并列在一起,每一行看起来都是不同的,因此它们无法提供帮助,没有一种模式可以利用它
{"bigPictureUrl":"https://fbcdn-profile-a.akamaihd.net/*-*-*/*.*.*.*/*/*.jpg",
"timelineCoverPhoto":"{\"focus\":{\"x\":0.5,\"y\":0.49137931034483},\"photo\":{\"__type__
\":{\"name\":\"Photo\"},\"image_lowres\":{\"uri\":\"https://fbcdn-*-*-*.*.*/*-*-*/*.jpg
\",\"width\":180,\"height\":135}}}",
"subscribeStatus":"IS_SUBSCRIBED","smallPictureUrl":"https://fbcdn-profile-a.akamaihd.net/*-*-*/*.*.*.*/*/*.jpg",
更简单的事情:
{"displayName":"Jim Test","firstName":"*","lastName":"*"}
我设法在displayName之后使用re.search('(?<="displayName":")(\w+) (\w+)',line)
取名为Jim Test,但如果你可以给我任何指示或建议,那么另一个非常复杂。
一条线就像这样
{"bigPictureUrl":"https://fbcdn-profile-a.akamaihd.net/hprofile-ak-prn2/*.*.*.*/s200x200/*_*_*_*.jpg","timelineCoverPhoto":"{\"focus\":{\"x\":0.5,\"y\":0.40652557319224},\"photo\":{\"__type__\":{\"name\":\"Photo\"},\"image_lowres\":{\"uri\":\"https://fbcdn-photos-h-a.akamaihd.net/hphotos-ak-prn2/*_*_*_a.jpg\",\"width\":180,\"height\":120}}}","subscribeStatus":"IS_SUBSCRIBED","smallPictureUrl":"https://fbcdn-profile-a.akamaihd.net/hprofile-ak-prn2/*.*.*.*/s100x100/*_*_*_a.jpg","contactId":"**==","contactType":"USER","friendshipStatus":"ARE_FRIENDS","graphApiWriteId":"contact_*:*:*","hugePictureUrl":"https://fbcdn-profile-a.akamaihd.net/hprofile-ak-prn2/*.*.*.*/s720x720/*_*_*_*.jpg","profileFbid":"*","isMobilePushable":"NO","lookupKey":null,"name":{"displayName":"* *","firstName":"*","lastName":"*"},"nameSearchTokens":["*","*"],"phones":[],"phoneticName":{"displayName":null,"firstName":null,"lastName":null},"isMemorialized":false,"communicationRank":0.4183731,"canViewerSendGift":false,"canMessage":true}
答案 0 :(得分:2)
与timelineCoverPhoto
相关联的值似乎是字符串化JSON,所以你可以做一些像这样丑陋的事情:
import json
s = {
"subscribeStatus": "IS_SUBSCRIBED",
"bigPictureUrl": "https://fbcdn-profile-a.akamaihd.net/*-*-*/*.*.*.*/*/*.jpg",
"timelineCoverPhoto": "{\"focus\":{\"x\":0.5,\"y\":0.49137931034483},\"photo\":{\"__type__\":{\"name\":\"Photo\"},\"image_lowres\":{\"uri\":\"https://fbcdn-*-*-*.*.*/*-*-*/*.jpg \",\"width\":180,\"height\":135}}}",
"smallPictureUrl": "https://fbcdn-profile-a.akamaihd.net/*-*-*/*.*.*.*/*/*.jpg"
}
j = json.loads(s.get('timelineCoverPhoto'))
print "uri:", j.get('photo').get('image_lowres').get('uri')
uri: https://fbcdn-*-*-*.*.*/*-*-*/*.jpg
答案 1 :(得分:2)
#See: http://daringfireball.net/2010/07/improved_regex_for_matching_urls
import re, urllib
GRUBER_URLINTEXT_PAT = re.compile(ur'(?i)\b((?:https?://|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:\'".,<>?\xab\xbb\u201c\u201d\u2018\u2019]))')
for line in urllib.urlopen("http://daringfireball.net/misc/2010/07/url-matching-regex-test-data.text"):
print [ mgroups[0] for mgroups in GRUBER_URLINTEXT_PAT.findall(line) ]
答案 2 :(得分:1)
如果你不能使用json,那怎么样?
>>> print mytext
{"bigPictureUrl":"https://fbcdn-profile-a.akamaihd.net/*-*-*/*.*.*.*/*/*.jpg",
"timelineCoverPhoto":"{"focus":{"x":0.5,"y":0.49137931034483},"photo":{"__type__
":{"name":"Photo"},"image_lowres":{"uri":"https://fbcdn-*-*-*.*.*/*-*-*/*.jpg
","width":180,"height":135}}}",
"subscribeStatus":"IS_SUBSCRIBED","smallPictureUrl":"https://fbcdn-profile-a.akamaihd.net/*-*-*/*.*.*.*/*/*.jpg",
>>> uri = re.findall(r'uri\"\:\"[\'"]?([^\'" >]+)', mytext) #gets the uri
>>> smallpicurl = re.findall(r'smallPictureUrl\"\:\"[\'"]?([^\'" >]+)', mytext) # gets the smallPictureUrl
>>> ''.join(uri).rstrip()
'https://fbcdn-*-*-*.*.*/*-*-*/*.jpg' # uri
>>> ''.join(smallpicurl).rstrip()
'https://fbcdn-profile-a.akamaihd.net/*-*-*/*.*.*.*/*/*.jpg' # smallPictureUrl