查找文件中最长行之前的标签

时间:2011-10-21 06:59:32

标签: python string

我有一个格式为

的文件
_line 1
this is a string on a line
_line 2
this is another string
_line 3
short line

我正在尝试编写一些Python代码来获取其下面字符串长度最长的字符串的_line X标签。你能帮我修改一下我的代码吗?这是我到目前为止所拥有的。

f = open('test.txt', 'r')
print f

read="null"
top_read_line_length="0"
topreadline="null"
for line in f:
    checkifread=line.find('line')
    if checkifread==1:
        print "Read label found"
        #means we are on a read line
        currentread=line
    else:
        #We are on a sequence line for currentread.
        currentlength=len(line)
        print currentlength
    print top_read_line_length

    if int(top_read_line_length) < int(currentlength):
        print topreadline
        topreadline=currentread#now topreadline label is the "_line" string
        topreadlinelength=int(currentlength)
        print topreadline

        #go to next line

print "Done"
print "Longest line is...."
print topreadline

12 个答案:

答案 0 :(得分:8)

如果你想要的只是文件中最长的一行(就像问题标题所说的那样),那么在现代Python中这一行非常简单:

>>> max(open('test.txt'), key=len)

答案 1 :(得分:8)

要获取最长行的标签,请构建标签到行长度的映射

在您的示例数据集中,看起来标签以“_line”开头,紧接着的是相应的行:

label2linelength = {}
for line in open('test.txt'):
    if line.startswith('_line '):
        label = line
    else:
        label2linelength[label] = len(line)
    lastline = line
print max(label2linelength.items(), key=lambda kv: kv[1])

答案 2 :(得分:3)

这很容易实现:

data = open('test.txt').readlines()
max_line_pos = data.index(max(data, key=len))
prev_line = data[max_line_pos-1]
print prev_line

答案 3 :(得分:2)

我会做类似的事情:

label = None
maxlen = 0
maxstr = ''
maxlabel = None
with open('f.txt') as f:
  for line in f:
    line = line.rstrip()
    if line.startswith('_line'):
      label = line
    elif len(line) > maxlen:
      maxlen = len(line)
      maxstr = line
      maxlabel = label
print maxlabel, maxstr

它比问题陈述更通用,因为它允许每个标签有多行文字。

答案 4 :(得分:2)

我详细说明雷蒙德的答案;如果石斑鱼()在标准库中可用,那么这个答案将再次接近一个oneliner;不幸的是,石斑鱼只在itertools examples.

中定义

我认为你更喜欢这个版本,因为它很实用。我没有测试它的性能,但至少我没有打开文件并寻求两次,也没有将整个内容保存在内存中。

from itertools import izip_longest
def grouper(n, iterable, fillvalue=None):
    "grouper(3, 'ABCDEFG', 'x') --> ABC DEF Gxx"
    args = [iter(iterable)] * n
    return izip_longest(fillvalue=fillvalue, *args)

max( grouper(2, open("test.txt")), key=lambda x:len(x[1]))[0]

答案 5 :(得分:2)

另一个简洁的变体:

from itertools import imap, izip
from operator import itemgetter
with open("a.py") as f:
    res = max(izip(f, imap(len, f)), key=itemgetter(1))[0]

这会将所有其他行视为标签。

答案 6 :(得分:1)

这是我的。它适用于这里的一些其他答案会失败的地方,例如:像

这样的输入文件
_line 1
abc
_line 2
defg
_line 3
hij

但它确实依赖于文件的格式,就像你说的那样。

with open('test.txt') as f:
  spam = f.readlines()

labels = spam[0::2]
lines = spam[1::2]

d = dict(zip(labels, lines))

longest_lines_label = max(d, key=lambda x: len(d[x]))

print "Longest line is...."
print longest_lines_label, d[longest_lines_label]

答案 7 :(得分:1)

如果你确定数据是正确的并且不需要任何错误处理,那么应该完成这项工作:

lines = open('test.txt', 'r').readlines()
print max([(len(lines[i+1]), lines[i])
           for i in xrange(0, len(lines), 2)])[1].strip()

答案 8 :(得分:0)

这是一个awk程序,它将执行此操作:

BEGIN { best=""; best_length=0; current=""; }
/^_/ { current=$0; }
/^[^_]/ { if(length($0) > best_length) { best=current; best_length=length($0); }}
END { print "Longest line: "best" with length: "best_length }

(我比以下python版本更喜欢它,它更贴切地回答了你的问题....)

best = ""
best_length = 0
current = ""
for line in f:  #( assumes f = open(...) from your code )
  if line[:5] == '_line':
    current = line.strip()
    continue
  else:
    if len(line) > best_length:
      best = current
      best_length = len(line.strip())
print "Longest line is: %s with length: %d" % (best,best_length)

答案 9 :(得分:0)

这个很短,即使你在每个标签后有多行文字

也行
content = list(open("test.txt"))
longest = content.index(max(content, key=len))
label = [ x for x in content[0:longest] if x.startswith("_line") ][-1]
print label.replace("_line ","")

答案 10 :(得分:0)

这是另一种方式:

import re, mmap

with open("test.txt", "rb") as f:
    mm = mmap.mmap(f.fileno(), 0, mmap.MAP_PRIVATE, mmap.PROT_READ)
    print max(re.finditer(r'_line (\d+)\n(.*)', mm),
              key=lambda m: len(m.group(2))).group(1)

答案 11 :(得分:0)

这是您的代码,已修复:

f = open('test.txt', 'r')
print f

read = None
top_read_line_length = 0
topreadline = None
currentlength = 0
label_line = True
for line in f:  
    if label_line:
        label_line = False
        print "label line", line
        #means we are on a read line
        currentread = line
    else:
        label_line = True
        #We are on a sequence line for currentread.
        currentlength = len(line)
        print 'cl', currentlength
    print top_read_line_length

    if top_read_line_length < currentlength:
        print 'trl', topreadline
        topreadline = currentread #now topreadline label is the "_line" string
        top_read_line_length = currentlength
        print 'trl', topreadline

        #go to next line

print "Done"
print "Longest line is...."
print topreadline

我添加了label_line布尔值来在标签行和数据行之间来回切换,但重要的部分是:

  • 在您的打印行上放置足够的信息以查看正在发生的事情;和
  • 与您的变量名称一致

问题发生在上一个if套件中 - 您正在检查top_read_line_length但是正在设置topreadlinelength(没有下划线)。