我有输入文件:
sun vehicle
one number
two number
reduce command
one speed
five speed
zero speed
speed command
kmh command
我使用了以下代码:
from collections import OrderedDict
output = OrderedDict()
with open('final') as in_file:
for line in in_file:
columns = line.split(' ')
if len(columns) >= 2:
word,tag = line.strip().split()
if output.has_key(tag) == False:
output[tag] = [];
output[tag].append(word)
else:
print ""
for k, v in output.items():
print '<{}> {} </{}>'.format(k, ' '.join(v), k)
output = OrderedDict()
我得到的输出为:
<vehicle> sun </vehicle>
<number> one two </number>
<command> reduce speed kmh </command>
<speed> one five zero </speed>
但我的预期输出应为:
<vehicle> sun </vehicle>
<number> one two </number>
<command> reduce
<speed> one five zero </speed>
speed kmh </command>
有人可以帮我解决这个问题吗?
答案 0 :(得分:2)
看起来你想要实现的输出是不明确的!
你可能希望代码提前知道&#34;在您到达speed
行之前,command
是speed command
的一部分。
要做你想做的事,你需要一个recursive function。 <怎么样
for k, v in output.items():
print expandElements(k, v,output)
和你定义的地方
def expandElements(k,v, dic):
out = '<' +k + '>'
for i in v:
# check each item of v for matches in dic.
# if no match, then out=out+i
# otherwise expand using a recursive call of expandElements()
# and out=out+expandElements
out = out + '<' +k + '>'
答案 1 :(得分:2)
看起来你想要输出某种树形结构?
您正在使用print '<{}> {} </{}>'.format(k, ' '.join(v), k)
进行打印,因此您的所有输出都将采用'<{}> {} </{}>'
的形式。
如果你想嵌套东西,你需要一个嵌套的结构来代表它们。
答案 2 :(得分:0)
为了递归地解析输入文件,我将创建一个表示标记的类。每个标记都可以包含children
。每个孩子都是第一个用tag.children.append("value")
手动添加的字符串,或者通过调用tag.add_value(tag.name,“value”)。
class Tag:
def __init__(self, name, parent=None):
self.name = name
self.children = []
self.has_root = True
self.parent = parent
def __str__(self):
""" compose string for this tag (recursivly) """
if not self.children:
return self.name
children_str = ' '.join([str(child) for child in self.children])
if not self.parent:
return children_str
return '<%s>%s</%s>' % (self.name, children_str, self.name)
@classmethod
def from_file(cls, file):
""" create root tag from file """
obj = cls('root')
columns = []
with open(file) as in_file:
for line in in_file:
value, tag = line.strip().split(' ')
obj.add_tag(tag, value)
return obj
def search_tag(self, tag):
""" search for a tag in the children """
if self.name == tag:
return self
for i, c in enumerate(self.children):
if isinstance(c, Tag) and c.name == tag:
return c
elif isinstance(c, str):
if c.strip() == tag.strip():
self.children[i] = Tag(tag, self)
return self.children[i]
else:
result = c.search_tag(tag)
if result:
return result
def add_tag(self, tag, value):
"""
add a value, tag pair to the children
Firstly this searches if the value is an child. If this is the
case it moves the children to the new location
Afterwards it searches the tag in the children. When found
the value is added to this tag. If not a new tag object
is created and added to this Tag. The flag has_root
is set to False so the element can be moved later.
"""
value_tag = self.search_tag(value)
if value_tag and not value_tag.has_root:
print("Found value: %s" % value)
if value_tag.parent:
i = value_tag.parent.children.index(value_tag)
value = value_tag.parent.children.pop(i)
value.has_root = True
else:
print("not %s" % value)
found = self.search_tag(tag)
if found:
found.children.append(value)
else:
# no root
tag_obj = Tag(tag, self)
self.children.append(tag_obj)
tag_obj.add_tag(tag, value)
tag_obj.has_root = False
tags = Tag.from_file('final')
print(tags)
我知道在这个例子中,speed-Tag没有添加两次。我希望没关系。 抱歉,长代码。