Pandas根据正则表达式汇总数据帧行

时间:2017-01-21 09:59:36

标签: python pandas

概述

根据下面的示例图片,我试图根据以下正则表达式连接数据帧行:

for row in df.index:
    if True in df.loc[row].str.contains("group-object").tolist():  
        l = df.loc[row].tolist()
        for i in l:
            match = re.search(r" group-object (\S+)", i)
            if match is not None:
                print (row,match.group(1))

输出:

('object-group network prt-apps2', 'prt-apps')
('object-group network prt-apps3', 'prt-apps2')

正则表达式正在查找字符串group-object,匹配捕获组为我提供group-object名称。然后,我需要对df索引使用此名称,并将包含捕获名称的行连接到当前行的末尾。

示例:

在示例图片中,我们看到行索引object-group network prt-apps2 col_1包含字符串group-object prt-apps。这指的是上面的索引object-group network prt-apps的行。我需要将此行(在图像中突出显示)连接到行索引object-group network prt-apps2的末尾。

正则表达式匹配的任何其他行也是如此。

example

我已经设法做到这一点,但很难看到如何使用concat或类似方法实现这一点。

数据:

来自df.to_dict()

dfData = {'col_0': {'object-group network prt-apps': ' network-object object fake-1 host 10.0.0.1',
  'object-group network prt-apps2': ' network-object object fake4 host 10.0.0.4',
  'object-group network prt-apps3': ' network-object object fake5 host 10.0.0.5'},
 'col_1': {'object-group network prt-apps': ' network-object object fake2 host 10.0.0.2 ',
  'object-group network prt-apps2': ' group-object prt-apps',
  'object-group network prt-apps3': ' group-object prt-apps2'},
 'col_2': {'object-group network prt-apps': ' network-object object fake3 host 10.0.0.0 255.255.255.0',
  'object-group network prt-apps2': '-',
  'object-group network prt-apps3': '-'},
 'col_3': {'object-group network prt-apps': ' network-object object fake121',
  'object-group network prt-apps2': '-',
  'object-group network prt-apps3': '-'}}

df = pd.DataFrame(dfData)

期望的输出:

我非常乐意按照上面指定的模式获得所需的输出。

enter image description here

如果数据可以向左移动以填充当前包含-但不是必需的任何单元格,那就更好了。底行是最长的,因为它有嵌套对象,object-group network prt-apps3包含group object prt-apps2,后面包含group object prt-apps

0 个答案:

没有答案