我是python的新手并且想要解析一个text.I我在这里加入了因为解析这个parse的原因。考虑我有一个示例文本文件" s1.txt"包含
示例s1.text
I like to play all games and sports
Game=chess
Sports=Baseball
I also like to play other games
Game=carrom
Sports=cricket
Game=tennis
所需的输出样本:
Game=chess
Game=carrom
Game=tennis
I like to play all games and sports
I also like to play other games
Sports=Baseball
Sports=cricket
我得到了一些使用正则表达式(.*?)=(.*)
的建议。但正则表达式让人困惑,是否有更好的方法使用字符串操作来解决它!
请帮助我获得所需的输出!感谢您的回答!
答案 0 :(得分:3)
创建一个确定特定线的相对值的函数。以“Game =”开头的行的值比通常值低,以“Sports =”开头的行的值越大。在对行集合进行排序时,使用此函数作为键。
def value(line):
if line.startswith("Game="):
return 0
elif line.startswith("Sports="):
return 2
else:
return 1
text = """I like to play all games and sports
Game=chess
Sports=Baseball
I also like to play other games
Game=carrom
Sports=cricket
Game=tennis"""
lines = text.split("\n")
lines.sort(key=value)
print "\n".join(lines)
结果:
Game=chess
Game=carrom
Game=tennis
I like to play all games and sports
I also like to play other games
Sports=Baseball
Sports=cricket
答案 1 :(得分:0)
根据您定义的订单,您希望基于a匹配=
的LH上的元素; b)文件中的行顺序。
扩展您的示例,假设您有:
txt='''\
Pleasure=swimming
I like to play all games and sports
Game=chess
Sports=Baseball
I also like to play other games
Game=carrom
Sports=cricket
Game=tennis
Pleasure=eating'''
如果你想使用正则表达式,你可以使用Kevin的sort方法,返回re.groups()
对象的等级来装饰sort函数。
回想一下,具有多个匹配组的正则表达式将返回哪个匹配组与None
匹配的其他匹配组:
>>> re.search(r'(^Game=)|(^Sports=)|(^Pleasure=)', 'Sports=').groups()
(None, 'Sports=', None)
然后,您可以使用生成器确定匹配组的顺序:
>>> next(i for i, e in enumerate((None, 'Sports=', None)) if e)
1
现在写一个关键函数进行排序:
def kf(s, rank_of_none=1):
m=re.search(r'(^Game=)|(^Sports=)|(^Pleasure=)', s)
if m:
return next(i for i, e in enumerate(m.groups()) if e)
else:
return rank_of_none-.1
现在看到你在元组的开头添加一个整数来确定排序的等级。我们可以使用浮点数来匹配,以便它按文件的行顺序排序:
for line in txt.splitlines():
print kf(line), line
输出:
2 Pleasure=swimming
0.9 I like to play all games and sports
0 Game=chess
1 Sports=Baseball
0.9 I also like to play other games
0 Game=carrom
1 Sports=cricket
0 Game=tennis
2 Pleasure=eating
根据正则表达式中匹配组的位置生成灵活排序现在完全无关紧要:
print '\n'.join(sorted(txt.splitlines(), key=kf))
输出:
Game=chess
Game=carrom
Game=tennis
I like to play all games and sports
I also like to play other games
Sports=Baseball
Sports=cricket
Pleasure=swimming
Pleasure=eating