Question

我想按文件中的前两个单词对文件进行分组（然后重新排列和打印）

我想要做

   lines=file.readlines()
   i=0
   for line in lines:
    word1=line.split()[0]
    word2=line.split()[1]
    if word1==lines[i+1].split()[0] and word1==lines[i-1].split()[0] :
        if word2=lines[i-1].split()[1] and word2==lines[i--1].split()[0]:
              print line
    else:
       print "***new block of lines \n***"

这是一个非常糟糕的解决方案，因为它不适用于第一行或最后一行，并且整体效果不佳。真的很感激更好的解决方案

Answer 1

如果您尝试对共享文件中前两个单词的连续行进行分组，则这是itertools.groupby的用例，例如：

from itertools import groupby

with open('somefile') as fin:
    lines = ((line.split(None, 2)[:2], line) for line in fin if line.strip())
    for k, g in groupby(lines, lambda L: L[0]):
        lines = [el[1] for el in g]

此处k是分组键（最多两个单词），lines将是共享该键的文件中的行。

示例somefile输入：

one two three four five
one two five six seven
three four something
three four something else
one two start of new one two block

print k, lines的结果：

['one', 'two'] ['one two three four five\n', 'one two five six seven\n']
['three', 'four'] ['three four something\n', 'three four something else\n']
['one', 'two'] ['one two start of new one two block\n']

要从line中排除前两个字，请使用：

with open('somefile') as fin:
    lines = (line.split(None, 2) for line in fin if line.strip())
    for k, g in groupby(lines, lambda L: L[:2]):
        lines = [el[2] for el in g]

Answer 2

这应该可行，但我无法确定没有示例文件和所需的输出样本。

from collections import defaultdict

d= defaultdict(list)
for line in text:
    try: 
        first, second =  line.split(' ', 2)[:2]
        first_two = '.'.join((first, second)).lower() 
        d[first_two].append(line) 
    except ValueError: 
        #or do something else with lines less than 2 words long here
        pass 

for first_two, lines in d.items(): 

    print("first two: %s" %(first_two.split("."), )) 
    for line in lines: 
        print(line) 
    print("         -----       ")

示例输入：

['one two three for five six',
 'three four five',
 'three nine seven eight',
 'three four five six seven',
 'one two nine eleven ']

示例输出：

first two: ['three', 'nine']
three nine seven eight
         -----       
first two: ['one', 'two']
one two three for five six
one two nine eleven 
         -----       
first two: ['three', 'four']
three four five
three four five six seven

Answer 3

应该这样做。

# f > File Pointer
lines = f.readlines()
x, y = lines[0].split(' ')[:2]
def chk_match(z, firstWord ,secondWord):
    t = z.split(' ')
    if len(t)>=2:
        if firstWord == t[0] and secondWord == t[1]:
            return 1
    return 0
print [z for z in lines if chk_match(z,x,y) ]

Answer 4

In [91]: simple_text = ['one two three for five six',
    ...:  'three four five',
    ...:  'three nine seven eight',
    ...:  'three four five six seven',
    ...:  'one two nine eleven ']

In [92]: result = {}

In [93]: for line in simple_text:
    ...:     result.setdefault(tuple(line.split()[:2]), [])
    ...:     result[tuple(line.split()[:2])].append(line)
    ...:     

In [94]: for k in result:
    ...:     print k , result[k]
    ...:     
('three', 'nine') ['three nine seven eight']
('one', 'two') ['one two three for five six', 'one two nine eleven ']
('three', 'four') ['three four five', 'three four five six seven']

如果你想订购;然后使用OrderedDict作为

In [95]: from collections import OrderedDict

In [96]: result = OrderedDict()

In [97]: for line in simple_text:
    ...:     result.setdefault(tuple(line.split()[:2]), [])
    ...:     result[tuple(line.split()[:2])].append(line)
    ...:     

In [98]: for k in result:
    ...:     print k , result[k]
    ...:     
('one', 'two') ['one two three for five six', 'one two nine eleven ']
('three', 'four') ['three four five', 'three four five six seven']
('three', 'nine') ['three nine seven eight']

Python-分组前两行

4 个答案: