我想按文件中的前两个单词对文件进行分组(然后重新排列和打印)
我想要做
lines=file.readlines()
i=0
for line in lines:
word1=line.split()[0]
word2=line.split()[1]
if word1==lines[i+1].split()[0] and word1==lines[i-1].split()[0] :
if word2=lines[i-1].split()[1] and word2==lines[i--1].split()[0]:
print line
else:
print "***new block of lines \n***"
这是一个非常糟糕的解决方案,因为它不适用于第一行或最后一行,并且整体效果不佳。真的很感激更好的解决方案
答案 0 :(得分:3)
如果您尝试对共享文件中前两个单词的连续行进行分组,则这是itertools.groupby
的用例,例如:
from itertools import groupby
with open('somefile') as fin:
lines = ((line.split(None, 2)[:2], line) for line in fin if line.strip())
for k, g in groupby(lines, lambda L: L[0]):
lines = [el[1] for el in g]
此处k
是分组键(最多两个单词),lines
将是共享该键的文件中的行。
示例somefile
输入:
one two three four five
one two five six seven
three four something
three four something else
one two start of new one two block
print k, lines
的结果:
['one', 'two'] ['one two three four five\n', 'one two five six seven\n']
['three', 'four'] ['three four something\n', 'three four something else\n']
['one', 'two'] ['one two start of new one two block\n']
要从line
中排除前两个字,请使用:
with open('somefile') as fin:
lines = (line.split(None, 2) for line in fin if line.strip())
for k, g in groupby(lines, lambda L: L[:2]):
lines = [el[2] for el in g]
答案 1 :(得分:1)
这应该可行,但我无法确定没有示例文件和所需的输出样本。
from collections import defaultdict
d= defaultdict(list)
for line in text:
try:
first, second = line.split(' ', 2)[:2]
first_two = '.'.join((first, second)).lower()
d[first_two].append(line)
except ValueError:
#or do something else with lines less than 2 words long here
pass
for first_two, lines in d.items():
print("first two: %s" %(first_two.split("."), ))
for line in lines:
print(line)
print(" ----- ")
示例输入:
['one two three for five six',
'three four five',
'three nine seven eight',
'three four five six seven',
'one two nine eleven ']
示例输出:
first two: ['three', 'nine']
three nine seven eight
-----
first two: ['one', 'two']
one two three for five six
one two nine eleven
-----
first two: ['three', 'four']
three four five
three four five six seven
答案 2 :(得分:1)
应该这样做。
# f > File Pointer
lines = f.readlines()
x, y = lines[0].split(' ')[:2]
def chk_match(z, firstWord ,secondWord):
t = z.split(' ')
if len(t)>=2:
if firstWord == t[0] and secondWord == t[1]:
return 1
return 0
print [z for z in lines if chk_match(z,x,y) ]
答案 3 :(得分:1)
In [91]: simple_text = ['one two three for five six',
...: 'three four five',
...: 'three nine seven eight',
...: 'three four five six seven',
...: 'one two nine eleven ']
In [92]: result = {}
In [93]: for line in simple_text:
...: result.setdefault(tuple(line.split()[:2]), [])
...: result[tuple(line.split()[:2])].append(line)
...:
In [94]: for k in result:
...: print k , result[k]
...:
('three', 'nine') ['three nine seven eight']
('one', 'two') ['one two three for five six', 'one two nine eleven ']
('three', 'four') ['three four five', 'three four five six seven']
如果你想订购;然后使用OrderedDict
作为
In [95]: from collections import OrderedDict
In [96]: result = OrderedDict()
In [97]: for line in simple_text:
...: result.setdefault(tuple(line.split()[:2]), [])
...: result[tuple(line.split()[:2])].append(line)
...:
In [98]: for k in result:
...: print k , result[k]
...:
('one', 'two') ['one two three for five six', 'one two nine eleven ']
('three', 'four') ['three four five', 'three four five six seven']
('three', 'nine') ['three nine seven eight']