递归自连接和换位

时间:2019-01-09 23:17:56

标签: python recursion transpose self-join

我有一个数据集,可跟踪多个测量周期中的父子关系。看起来像

function _walk(node,criteria, slot) {
  if (node.assignedSlot === null || node.assignedSlot === slot) {
    if (node.localName === 'slot') {
      const assignedNodes = node.assignedNodes();
      if (assignedNodes.length  === 0) {
        _walkA(node.children, criteria);
      } else {
        _walkA(assignedNodes.filter(n => n.nodeType === Node.ELEMENT_NODE), criteria, node);
      }
    } else if (!criteria(node)) {
      if (customElements.get(node.localName)) _walkA(node.shadowRoot.children,criteria);
      _walkA(node.children, criteria);
    }
  }
}
function _walkA(nodes,criteria, slot) {
  for (let n of nodes) {
    _walk(n,criteria, slot);
  }
}
export default function walk(walknode, criteria) {
  _walk(walknode,criteria,null);
}

我希望它显示为

[["Col1","Col2"],
 ["A","B"],
 ["B","C"],
 ["C","D"]]

我已经在SQL中看到了其他示例,但是这些示例似乎都没有回答我的问题。我正在寻找数据的完整列式扩展。

我已经研究了自连接和换位,但是没有一个能使我到达那里。

这不需要超级专业的python包,因为我需要将其移植到其他几种编码语言上。

更新:第二个示例: 如果我有一个数据集,例如

[["Col1","Col2","Col3","Col4"],
 ["A","B","C","D"]]

我希望:

[["Col1","Col2"],
 ["A","B1"],
 ["B1","C1"],
 ["B1,"C2"],
 ["C2,"D"],
 ["A,"B2"]]

2 个答案:

答案 0 :(得分:0)

为您提供想要的结果:

fam = [["Col1","Col2"],["A","B"],["B","C"],["C","D"]]

col, chi, res = [], [], []

for i in fam:
    for ii in i:
        if len(ii) == 1:
            if ii in chi:
                chi.remove(ii)
            chi.append(ii)
        else:
            col.append(ii)

res.append(col)
res.append(chi)

>>>print(res)
>>>[['Col1', 'Col2'], ['A', 'B', 'C', 'D']]

答案 1 :(得分:0)

您可以从父子对中构建列表的字典,将每个父母映射到孩子列表,使用字典键和子集之间的集合差异找到顶级父母,使这些顶级父母成为孩子(None),以便您可以从以None作为最高父级的映射字典中递归地构建联接列表,但是在输出联接列表时忽略None

def join(pairs):
    def _join(parent=None):
        if parent not in tree:
            return [[parent]]
        output = []
        for child in tree[parent]:
            for joined in _join(child):
                output.append([*([parent] if parent else []), *joined])
        return output
    tree = {}
    children = set()
    for parent, child in pairs:
        tree.setdefault(parent, []).append(child)
        children.add(child)
    for parent in tree.keys() - children:
        tree.setdefault(None, []).append(parent)
    return _join()

所以给定:

pairs = [
    ["A", "B1"],
    ["B1", "C1"],
    ["B1", "C2"],
    ["C2", "D"],
    ["A", "B2"]
]

joined(pairs)将返回:

[['A', 'B1', 'C1'], ['A', 'B1', 'C2', 'D'], ['A', 'B2']]

现在,如果您想用空字符串填充较少列的行,则可以先获取最大列数,然后遍历行以使用足够的空字符串扩展它们以使它们具有相等的列数:

joined = join(pairs)
max_columns = max(map(len, joined))
for path in joined:
    path.extend([''] * (max_columns - len(path)))

joined将变为:

[['A', 'B1', 'C1', ''], ['A', 'B1', 'C2', 'D'], ['A', 'B2', '', '']]

请注意,我忽略了问题中的列标题,例如['Col1', 'Col2'],因为它们与问题无关,并且您也没有提供关于'Col3'和{{1的位置的解释}}来自。