我已经使用网络抓取提取了一个有关64位Steam ID和好友列表的字符串。我想获得唯一的Steamid,以便将其存储在其他文件中。我使用了正则表达式,但我认为符号部分有误。
这是字符串。
{"friendslist":{"friends":[{"steamid":"7656xxxxxxx80x76","relationship":"friend","friend_since":1552765824},{"steamid":"76561xxxxxxx4xx89","relationship":"friend","friend_since":1508594830},{"steamid":"765xxxxxxxxxxx3194","relationship":"friend","friend_since":1543773569}]}}
我使用正则表达式是这样的:
import re
re.findall("[^:[0-9]+[0-9]+", soup.text)
但是,我得到了以下结果:
['"7656xxxxxxx80x76',
'"76561xxxxxxx4xx89',
'"765xxxxxxxxxxx3194']
我该如何消除数字开头的同上标记(“)?
答案 0 :(得分:1)
您有JSON字符串,因此请使用模块json
import json
text = '{"friendslist":{"friends":[{"steamid":"7656xxxxxxx80x76","relationship":"friend","friend_since":1552765824},{"steamid":"76561xxxxxxx4xx89","relationship":"friend","friend_since":1508594830},{"steamid":"765xxxxxxxxxxx3194","relationship":"friend","friend_since":1543773569}]}}'
data = json.loads(text)
for friend in data["friendslist"]['friends']:
print(friend['steamid'])
结果:
7656xxxxxxx80x76
76561xxxxxxx4xx89
765xxxxxxxxxxx3194
答案 1 :(得分:0)
我制作了一个递归函数,该函数获取数据和键,然后列出结果列表:
data = {"friendslist":{"friends":[{"steamid":"7656xxxxxxx80x76","relationship":"friend","friend_since":1552765824},{"steamid":"76561xxxxxxx4xx89","relationship":"friend","friend_since":1508594830},{"steamid":"765xxxxxxxxxxx3194","relationship":"friend","friend_since":1543773569}]}}
def getDataFromNestedDict(data, dictKey):
if isinstance(data, dict):
if dictKey in data.keys():
steamDataList.append(data[dictKey])
for key, value in data.items():
if isinstance(value, dict):
getDataFromNestedDict(value, dictKey)
elif isinstance(value, list):
for item in value:
getDataFromNestedDict(item,dictKey)
elif isinstance(data, list):
for item in data:
getDataFromNestedDict(item,dictKey)
steamDataList = []
getDataFromNestedDict(data, 'steamid')
print(steamDataList)
输出:
['7656xxxxxxx80x76', '76561xxxxxxx4xx89', '765xxxxxxxxxxx3194']
答案 2 :(得分:0)
您提供的正则表达式未达到您的期望。第一个[
与第一个]
匹配。
使用先行/后方查找双引号:
(?<=\")(\d+[x\d]+\d)(?=\")
@Furas是正确的。您应该只解析JSON。
答案 3 :(得分:0)
我建议您遵循@furas的答案(使用json解析器)。
但是,如果您真的想使用Regex:[^ [“] + [0-9] + [0-9] +