找不到正确的Regex代码以提取准确的数字

时间:2019-09-03 11:37:35

标签: python regex htmltext

我已经使用网络抓取提取了一个有关64位Steam ID和好友列表的字符串。我想获得唯一的Steamid,以便将其存储在其他文件中。我使用了正则表达式,但我认为符号部分有误。

这是字符串。

{"friendslist":{"friends":[{"steamid":"7656xxxxxxx80x76","relationship":"friend","friend_since":1552765824},{"steamid":"76561xxxxxxx4xx89","relationship":"friend","friend_since":1508594830},{"steamid":"765xxxxxxxxxxx3194","relationship":"friend","friend_since":1543773569}]}}

我使用正则表达式是这样的:

import re
re.findall("[^:[0-9]+[0-9]+", soup.text)

但是,我得到了以下结果:

['"7656xxxxxxx80x76',
'"76561xxxxxxx4xx89',
'"765xxxxxxxxxxx3194']

我该如何消除数字开头的同上标记(“)?

4 个答案:

答案 0 :(得分:1)

您有JSON字符串,因此请使用模块json

import json

text = '{"friendslist":{"friends":[{"steamid":"7656xxxxxxx80x76","relationship":"friend","friend_since":1552765824},{"steamid":"76561xxxxxxx4xx89","relationship":"friend","friend_since":1508594830},{"steamid":"765xxxxxxxxxxx3194","relationship":"friend","friend_since":1543773569}]}}'

data = json.loads(text)

for friend in data["friendslist"]['friends']:
    print(friend['steamid'])

结果:

7656xxxxxxx80x76
76561xxxxxxx4xx89
765xxxxxxxxxxx3194

答案 1 :(得分:0)

我制作了一个递归函数,该函数获取数据和键,然后列出结果列表:

data = {"friendslist":{"friends":[{"steamid":"7656xxxxxxx80x76","relationship":"friend","friend_since":1552765824},{"steamid":"76561xxxxxxx4xx89","relationship":"friend","friend_since":1508594830},{"steamid":"765xxxxxxxxxxx3194","relationship":"friend","friend_since":1543773569}]}}
def getDataFromNestedDict(data, dictKey):
    if isinstance(data, dict):
        if dictKey in data.keys():
            steamDataList.append(data[dictKey])
        for key, value in data.items():
            if isinstance(value, dict):
                getDataFromNestedDict(value, dictKey)
            elif isinstance(value, list):
                for item in value:
                    getDataFromNestedDict(item,dictKey)

    elif isinstance(data, list):
        for item in data:
            getDataFromNestedDict(item,dictKey)
steamDataList = []
getDataFromNestedDict(data, 'steamid')
print(steamDataList)

输出:

['7656xxxxxxx80x76', '76561xxxxxxx4xx89', '765xxxxxxxxxxx3194']

答案 2 :(得分:0)

您提供的正则表达式未达到您的期望。第一个[与第一个]匹配。

使用先行/后方查找双引号:

(?<=\")(\d+[x\d]+\d)(?=\")

@Furas是正确的。您应该只解析JSON。

答案 3 :(得分:0)

我建议您遵循@furas的答案(使用json解析器)。

但是,如果您真的想使用Regex:[^ [“] + [0-9] + [0-9] +