Question

给出以下无序制表符分隔文件：

Asia    Srilanka
Srilanka    Colombo
Continents  Europe
India   Mumbai
India   Pune
Continents  Asia
Earth   Continents
Asia    India

目标是生成以下输出（制表符分隔）：

Earth   Continents  Asia    India   Mumbai
Earth   Continents  Asia    India   Pune
Earth   Continents  Asia    Srilanka    Colombo
Earth   Continents  Europe

我创建了以下脚本来实现目标：

root={} # this hash will finally contain the ROOT member from which all the nodes emanate
link={} # this is to hold the grouping of immediate children 
for line in f:
    line=line.rstrip('\r\n')
    line=line.strip()
    cols=list(line.split('\t'))
    parent=cols[0]
    child=cols[1]
    if not parent in link:
        root[parent]=1
    if child in root:
        del root[child]
    if not child in link:
        link[child]={}
    if not parent in link:
        link[parent]={}
    link[parent][child]=1

现在我打算使用之前创建的两个dict（root和link）打印所需的输出。我不知道如何在python中执行此操作。但我知道我们可以在perl中编写以下内容来实现结果：

print_links($_) for sort keys %root;

sub print_links
{
  my @path = @_;

  my %children = %{$link{$path[-1]}};
  if (%children)
  {
    print_links(@path, $_) for sort keys %children;
  } 
  else 
  {
    say join "\t", @path;
  }
}

你能帮我在python 3.x中实现所需的输出吗？

Answer 1

我在这里看到下一个问题：

从文件中读取关系;
从关系中构建层次结构。
将层次结构写入文件。

假设层次结构树的高度小于默认recursion limit（在大多数情况下等于1000），让我们为这些单独的任务定义效用函数。

实用程序

解析关系可以用

完成

def parse_relations(lines):
    relations = {}
    splitted_lines = (line.split() for line in lines)
    for parent, child in splitted_lines:
        relations.setdefault(parent, []).append(child)
    return relations

可以使用

完成构建层次结构

Python＆gt; = 3.5

def flatten_hierarchy(relations, parent='Earth'):
    try:
        children = relations[parent]
        for child in children:
            sub_hierarchy = flatten_hierarchy(relations, child)
            for element in sub_hierarchy:
                try:
                    yield (parent, *element)
                except TypeError:
                    # we've tried to unpack `None` value,
                    # it means that no successors left
                    yield (parent, child)
    except KeyError:
        # we've reached end of hierarchy
        yield None

Python＆lt; 3.5 ：扩展可迭代解包was added with PEP-448，但可以用itertools.chain代替

import itertools


def flatten_hierarchy(relations, parent='Earth'):
    try:
        children = relations[parent]
        for child in children:
            sub_hierarchy = flatten_hierarchy(relations, child)
            for element in sub_hierarchy:
                try:
                    yield tuple(itertools.chain([parent], element))
                except TypeError:
                    # we've tried to unpack `None` value,
                    # it means that no successors left
                    yield (parent, child)
    except KeyError:
        # we've reached end of hierarchy
        yield None

可以使用

完成层次结构导出到文件

def write_hierarchy(hierarchy, path, delimiter='\t'):
    with open(path, mode='w') as file:
        for row in hierarchy:
            file.write(delimiter.join(row) + '\n')

用法

假设文件路径为'relations.txt'：

with open('relations.txt') as file:
    relations = parse_relations(file)

给我们

>>> relations
{'Asia': ['Srilanka', 'India'],
 'Srilanka': ['Colombo'],
 'Continents': ['Europe', 'Asia'],
 'India': ['Mumbai', 'Pune'],
 'Earth': ['Continents']}

我们的层次结构是

>>> list(flatten_hierarchy(relations))
[('Earth', 'Continents', 'Europe'),
 ('Earth', 'Continents', 'Asia', 'Srilanka', 'Colombo'),
 ('Earth', 'Continents', 'Asia', 'India', 'Mumbai'),
 ('Earth', 'Continents', 'Asia', 'India', 'Pune')]

最后将其导出到名为'hierarchy.txt'的文件：

>>> write_hierarchy(sorted(hierarchy), 'hierarchy.txt')

（我们使用sorted来获取所需输出文件中的层次结构）

P上。 S上。

如果您不熟悉Python generators，我们可以定义flatten_hierarchy函数

Python＆gt; = 3.5

def flatten_hierarchy(relations, parent='Earth'):
    try:
        children = relations[parent]
    except KeyError:
        # we've reached end of hierarchy
        return None
    result = []
    for child in children:
        sub_hierarchy = flatten_hierarchy(relations, child)
        try:
            for element in sub_hierarchy:
                result.append((parent, *element))
        except TypeError:
            # we've tried to iterate through `None` value,
            # it means that no successors left
            result.append((parent, child))
    return result

Python＆lt; 3.5

import itertools def flatten_hierarchy(relations, parent='Earth'): try: children = relations[parent] except KeyError: # we've reached end of hierarchy return None result = [] for child in children: sub_hierarchy = flatten_hierarchy(relations, child) try: for element in sub_hierarchy: result.append(tuple(itertools.chain([parent], element))) except TypeError: # we've tried to iterate through `None` value, # it means that no successors left result.append((parent, child)) return result

Answer 2

通过简单的步骤，我们可以做到这一点，

步骤1 ：将数据转换为数据框
第2步：从第1列中获取唯一元素，该元素不在第2列中
第3步：从第1列获取唯一元素后，将第1列转换为数据框，
第4步：通过使用pd.merge（）合并数据框，左数据帧作为第1列中的唯一元素正确的数据帧作为我们在步骤1中转换的主要数据，
第5步：删除所有列中的重复项

Answer 3

先决条件：

数据应采用DataFrame的形式，
应该有两列。


# now we are going to create the function 
def root_to_leaves(data):
    # import library
    import pandas as pd
    # Take the names of first and second columns.
    first_column_name = data.columns[0]
    second_column_name = data.columns[1]
    #XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
    # Take a unique element from column 1 which is not in column 2.
    # We use set difference operation.
    A = set(data[first_column_name])
    B = set(data[second_column_name])
    C = list(A - B)
    # m0 means nothing but variable name.
    m0 = pd.DataFrame({'stage_1': C})
    #XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
    # first merge data
    data = data.rename(columns ={first_column_name:'stage_1',second_column_name:'stage_2'})
    m1 = pd.merge(m0, data , on = 'stage_1', how = 'left')
    data = data.rename(columns = {'stage_1':'stage_2','stage_2':'stage_3'})
    # count of nan
    count_of_nan = 0
    i = 0
    while (count_of_nan != m1.shape[0]):
        on_variable = "stage_"+str(i+2)
        m2 = pd.merge(m1, data , on = on_variable, how = 'left')
        data = data.rename(columns = {'stage_'+str(i+2)+'':'stage_'+str(i+3)+'','stage_'+str(i+3)+'':'stage_'+str(i+4)+''})
        m1 = m2
        i = i + 1
        count_of_nan = m1.iloc[:,-1].isnull().sum()
    final_data = m1.iloc[:,:-1]
    return final_data

# you can find the result in the data_result
data_result = root_to_leaves(data)

Python - 创建层次结构文件

3 个答案:

实用程序

用法

P上。 S上。