从pandas DataFrame中的字典列表中匹配str值

时间:2019-12-13 17:42:59

标签: python pandas

我有一个要自动化的应用程序。为了收集所有需要的信息,我必须对应用程序中的每个节点运行一个API调用。我使用以下功能进行此操作:

def node_df(self, nodes):
    lst_dict = []
    for node in nodes:
        jsonstr = self.session.get_hdw_node(node)
        lst_dict.append(jsonstr)
    df1 = pd.DataFrame()
    df2 = df1.append(lst_dict)
    drop = df2.drop(["nodeType", "timestamp"], axis=1)
    return drop

单节点JSON输出:

{
    "Nodes": [{
        "id": "P1_H17_F03_JN0@00-9C7719-839318-409186-3459CB[0-1-94]",
        "timestamp": "2019-12-11T16:22:55Z",
        "name": "P1_H17_F03_JN0",
        "node": "node1@00-E4DEA9-90EB54-48BE9D-7C7D62[1-1-10]",
        "dataSet": [{
            "dev": "0x003C"
        }, {
            "dev": "0x00C2"
        }, {
            "dev": "0x002A"
        }, {
            "dev": "0x0048"
        }, {
            "dev": "0x0011"
        }, {
            "dev": "0x0024"
        }],
        "nodeType": "HardwareNodeBlock"
    }]
}

输出创建以下DataFrame:

    dataSet                                             node                                            id
0   [{'dev': '0x003C'}, {'dev': '0x00C2'}, {'dev'...    node1@00-E4DEA9-90EB45-48BE9D-7C7D62[1-1-10]    P1_H17_F03_JN0@00-9C7719-839318-409...
1   [{'dev': '0x0020'}, {'dev': '0x0038'}, {'dev'...    node2@00-32BF13-BABA54-4B7FBF-B34F5B[1-1-8] P1_H14_F04_JN1@00-77E5FA-C1055C-4E0...
2   [{'dev': '0x0112'}, {'dev': '0x0113'}, {'dev'...    node2@00-32BF13-BABA54-4B7FBF-B34F5B[1-1-8] P1_H14_F04_JN2@00-F3D05C-08DB23-443...
3   [{'dev': '0x00E9'}, {'dev': '0x00EC'}, {'dev'...    node2@00-32BF13-BABA54-4B7FBF-B34F5B[1-1-8] P1_H14_F04_JN3@00-DC0EED-31DE6C-4B3...
4   [{'dev': '0x004B'}, {'dev': '0x0061'}, {'dev'...    node2@00-32BF13-BABA54-4B7FBF-B34F5B[1-1-8] P1_H14_F04_JN4@00-3A57F1-E7A3B6-44E...

我正在尝试匹配'0x0113'中的值df["dataSet"],所以我可以返回该行以获得df["id"]'0x0113'是连接到节点的设备之一。

通常我会做一个df.loc[df['dataSet'].str.match('0x0113')],但是这里当然不是那么简单。如何才能做到这一点?我认为最好先合并dataSet键,因为它们都具有相同的键名。

非常感谢您的帮助。

1 个答案:

答案 0 :(得分:0)

我能够找到解决此问题的方法。我首先将键合并到一个字典中,在node_df函数中将以下函数应用于我的DataFrame:

def consolidate_dataset(data):
    final = {}
    for dev in data:
        for key, val in dev.items():
            final.setdefault(key, []).append(val)
    return final


def node_df(self, nodes):
    ...
    df2 = df1.append(lst_dict)
    df2["dataSet"] = df2["dataSet"].apply(self.consolidate_dataset)
    ...

输出:

{'dev': ['0x003C', '0x00C2', '0x002A', '0x0048', '0x0011', '0x0024']}

然后,我决定根本不需要该键,并用将consolidate_dataset转换为列表的功能替换了dataSet

def dataset_list(data):
    final = []
    for dev in data:
        for _key, val in dev.items():
            final.append(val)
    return final

输出:

['0x003C', '0x00C2', '0x002A', '0x0048', '0x0011', '0x0024']

现在,我可以轻松匹配DataFrame中的值'0x0113'来获得df["id"]

df2[['0x0113' in dev for dev in df2['dataSet']]]

输出:

    dataSet                         node                                     id
2   ['0x0112', '0x0113', '0x0114']  node2@00-32BF13-BABA45-4B7FBF-B34F5B[1-1-8] P1_H14_F04_JN31@00-F3D05C-08DB23-443...