给出以下无序制表符分隔文件:
Asia Srilanka
Srilanka Colombo
Continents Europe
India Mumbai
India Pune
Continents Asia
Earth Continents
Asia India
目标是生成以下输出(制表符分隔):
Earth Continents Asia India Mumbai
Earth Continents Asia India Pune
Earth Continents Asia Srilanka Colombo
Earth Continents Europe
我创建了以下脚本来实现目标:
root={} # this hash will finally contain the ROOT member from which all the nodes emanate
link={} # this is to hold the grouping of immediate children
for line in f:
line=line.rstrip('\r\n')
line=line.strip()
cols=list(line.split('\t'))
parent=cols[0]
child=cols[1]
if not parent in link:
root[parent]=1
if child in root:
del root[child]
if not child in link:
link[child]={}
if not parent in link:
link[parent]={}
link[parent][child]=1
现在我打算使用之前创建的两个dict(root和link)打印所需的输出。我不知道如何在python中执行此操作。但我知道我们可以在perl中编写以下内容来实现结果:
print_links($_) for sort keys %root;
sub print_links
{
my @path = @_;
my %children = %{$link{$path[-1]}};
if (%children)
{
print_links(@path, $_) for sort keys %children;
}
else
{
say join "\t", @path;
}
}
你能帮我在python 3.x中实现所需的输出吗?
答案 0 :(得分:3)
我在这里看到下一个问题:
假设层次结构树的高度小于默认recursion limit(在大多数情况下等于1000
),让我们为这些单独的任务定义效用函数。
解析关系可以用
完成def parse_relations(lines):
relations = {}
splitted_lines = (line.split() for line in lines)
for parent, child in splitted_lines:
relations.setdefault(parent, []).append(child)
return relations
可以使用
完成构建层次结构Python> = 3.5
def flatten_hierarchy(relations, parent='Earth'):
try:
children = relations[parent]
for child in children:
sub_hierarchy = flatten_hierarchy(relations, child)
for element in sub_hierarchy:
try:
yield (parent, *element)
except TypeError:
# we've tried to unpack `None` value,
# it means that no successors left
yield (parent, child)
except KeyError:
# we've reached end of hierarchy
yield None
Python< 3.5 :扩展可迭代解包was added with PEP-448,但可以用itertools.chain
代替
import itertools
def flatten_hierarchy(relations, parent='Earth'):
try:
children = relations[parent]
for child in children:
sub_hierarchy = flatten_hierarchy(relations, child)
for element in sub_hierarchy:
try:
yield tuple(itertools.chain([parent], element))
except TypeError:
# we've tried to unpack `None` value,
# it means that no successors left
yield (parent, child)
except KeyError:
# we've reached end of hierarchy
yield None
可以使用
完成层次结构导出到文件def write_hierarchy(hierarchy, path, delimiter='\t'):
with open(path, mode='w') as file:
for row in hierarchy:
file.write(delimiter.join(row) + '\n')
假设文件路径为'relations.txt'
:
with open('relations.txt') as file:
relations = parse_relations(file)
给我们
>>> relations
{'Asia': ['Srilanka', 'India'],
'Srilanka': ['Colombo'],
'Continents': ['Europe', 'Asia'],
'India': ['Mumbai', 'Pune'],
'Earth': ['Continents']}
我们的层次结构是
>>> list(flatten_hierarchy(relations))
[('Earth', 'Continents', 'Europe'),
('Earth', 'Continents', 'Asia', 'Srilanka', 'Colombo'),
('Earth', 'Continents', 'Asia', 'India', 'Mumbai'),
('Earth', 'Continents', 'Asia', 'India', 'Pune')]
最后将其导出到名为'hierarchy.txt'
的文件:
>>> write_hierarchy(sorted(hierarchy), 'hierarchy.txt')
(我们使用sorted
来获取所需输出文件中的层次结构)
如果您不熟悉Python
generators,我们可以定义flatten_hierarchy
函数
Python> = 3.5
def flatten_hierarchy(relations, parent='Earth'):
try:
children = relations[parent]
except KeyError:
# we've reached end of hierarchy
return None
result = []
for child in children:
sub_hierarchy = flatten_hierarchy(relations, child)
try:
for element in sub_hierarchy:
result.append((parent, *element))
except TypeError:
# we've tried to iterate through `None` value,
# it means that no successors left
result.append((parent, child))
return result
Python< 3.5 强>
import itertools
def flatten_hierarchy(relations, parent='Earth'):
try:
children = relations[parent]
except KeyError:
# we've reached end of hierarchy
return None
result = []
for child in children:
sub_hierarchy = flatten_hierarchy(relations, child)
try:
for element in sub_hierarchy:
result.append(tuple(itertools.chain([parent], element)))
except TypeError:
# we've tried to iterate through `None` value,
# it means that no successors left
result.append((parent, child))
return result
答案 1 :(得分:1)
通过简单的步骤,我们可以做到这一点,
答案 2 :(得分:0)
先决条件:
# now we are going to create the function
def root_to_leaves(data):
# import library
import pandas as pd
# Take the names of first and second columns.
first_column_name = data.columns[0]
second_column_name = data.columns[1]
#XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
# Take a unique element from column 1 which is not in column 2.
# We use set difference operation.
A = set(data[first_column_name])
B = set(data[second_column_name])
C = list(A - B)
# m0 means nothing but variable name.
m0 = pd.DataFrame({'stage_1': C})
#XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
# first merge data
data = data.rename(columns ={first_column_name:'stage_1',second_column_name:'stage_2'})
m1 = pd.merge(m0, data , on = 'stage_1', how = 'left')
data = data.rename(columns = {'stage_1':'stage_2','stage_2':'stage_3'})
# count of nan
count_of_nan = 0
i = 0
while (count_of_nan != m1.shape[0]):
on_variable = "stage_"+str(i+2)
m2 = pd.merge(m1, data , on = on_variable, how = 'left')
data = data.rename(columns = {'stage_'+str(i+2)+'':'stage_'+str(i+3)+'','stage_'+str(i+3)+'':'stage_'+str(i+4)+''})
m1 = m2
i = i + 1
count_of_nan = m1.iloc[:,-1].isnull().sum()
final_data = m1.iloc[:,:-1]
return final_data
# you can find the result in the data_result
data_result = root_to_leaves(data)