我有一个格式为
的文件_line 1
this is a string on a line
_line 2
this is another string
_line 3
short line
我正在尝试编写一些Python代码来获取其下面字符串长度最长的字符串的_line X标签。你能帮我修改一下我的代码吗?这是我到目前为止所拥有的。
f = open('test.txt', 'r')
print f
read="null"
top_read_line_length="0"
topreadline="null"
for line in f:
checkifread=line.find('line')
if checkifread==1:
print "Read label found"
#means we are on a read line
currentread=line
else:
#We are on a sequence line for currentread.
currentlength=len(line)
print currentlength
print top_read_line_length
if int(top_read_line_length) < int(currentlength):
print topreadline
topreadline=currentread#now topreadline label is the "_line" string
topreadlinelength=int(currentlength)
print topreadline
#go to next line
print "Done"
print "Longest line is...."
print topreadline
答案 0 :(得分:8)
如果你想要的只是文件中最长的一行(就像问题标题所说的那样),那么在现代Python中这一行非常简单:
>>> max(open('test.txt'), key=len)
答案 1 :(得分:8)
要获取最长行的标签,请构建标签到行长度的映射
在您的示例数据集中,看起来标签以“_line”开头,紧接着的是相应的行:
label2linelength = {}
for line in open('test.txt'):
if line.startswith('_line '):
label = line
else:
label2linelength[label] = len(line)
lastline = line
print max(label2linelength.items(), key=lambda kv: kv[1])
答案 2 :(得分:3)
这很容易实现:
data = open('test.txt').readlines()
max_line_pos = data.index(max(data, key=len))
prev_line = data[max_line_pos-1]
print prev_line
答案 3 :(得分:2)
我会做类似的事情:
label = None
maxlen = 0
maxstr = ''
maxlabel = None
with open('f.txt') as f:
for line in f:
line = line.rstrip()
if line.startswith('_line'):
label = line
elif len(line) > maxlen:
maxlen = len(line)
maxstr = line
maxlabel = label
print maxlabel, maxstr
它比问题陈述更通用,因为它允许每个标签有多行文字。
答案 4 :(得分:2)
我详细说明雷蒙德的答案;如果石斑鱼()在标准库中可用,那么这个答案将再次接近一个oneliner;不幸的是,石斑鱼只在itertools examples.
中定义我认为你更喜欢这个版本,因为它很实用。我没有测试它的性能,但至少我没有打开文件并寻求两次,也没有将整个内容保存在内存中。
from itertools import izip_longest
def grouper(n, iterable, fillvalue=None):
"grouper(3, 'ABCDEFG', 'x') --> ABC DEF Gxx"
args = [iter(iterable)] * n
return izip_longest(fillvalue=fillvalue, *args)
max( grouper(2, open("test.txt")), key=lambda x:len(x[1]))[0]
答案 5 :(得分:2)
另一个简洁的变体:
from itertools import imap, izip
from operator import itemgetter
with open("a.py") as f:
res = max(izip(f, imap(len, f)), key=itemgetter(1))[0]
这会将所有其他行视为标签。
答案 6 :(得分:1)
这是我的。它适用于这里的一些其他答案会失败的地方,例如:像
这样的输入文件_line 1
abc
_line 2
defg
_line 3
hij
但它确实依赖于文件的格式,就像你说的那样。
with open('test.txt') as f:
spam = f.readlines()
labels = spam[0::2]
lines = spam[1::2]
d = dict(zip(labels, lines))
longest_lines_label = max(d, key=lambda x: len(d[x]))
print "Longest line is...."
print longest_lines_label, d[longest_lines_label]
答案 7 :(得分:1)
如果你确定数据是正确的并且不需要任何错误处理,那么应该完成这项工作:
lines = open('test.txt', 'r').readlines()
print max([(len(lines[i+1]), lines[i])
for i in xrange(0, len(lines), 2)])[1].strip()
答案 8 :(得分:0)
这是一个awk
程序,它将执行此操作:
BEGIN { best=""; best_length=0; current=""; }
/^_/ { current=$0; }
/^[^_]/ { if(length($0) > best_length) { best=current; best_length=length($0); }}
END { print "Longest line: "best" with length: "best_length }
(我比以下python
版本更喜欢它,它更贴切地回答了你的问题....)
best = ""
best_length = 0
current = ""
for line in f: #( assumes f = open(...) from your code )
if line[:5] == '_line':
current = line.strip()
continue
else:
if len(line) > best_length:
best = current
best_length = len(line.strip())
print "Longest line is: %s with length: %d" % (best,best_length)
答案 9 :(得分:0)
这个很短,即使你在每个标签后有多行文字
也行content = list(open("test.txt"))
longest = content.index(max(content, key=len))
label = [ x for x in content[0:longest] if x.startswith("_line") ][-1]
print label.replace("_line ","")
答案 10 :(得分:0)
这是另一种方式:
import re, mmap
with open("test.txt", "rb") as f:
mm = mmap.mmap(f.fileno(), 0, mmap.MAP_PRIVATE, mmap.PROT_READ)
print max(re.finditer(r'_line (\d+)\n(.*)', mm),
key=lambda m: len(m.group(2))).group(1)
答案 11 :(得分:0)
这是您的代码,已修复:
f = open('test.txt', 'r')
print f
read = None
top_read_line_length = 0
topreadline = None
currentlength = 0
label_line = True
for line in f:
if label_line:
label_line = False
print "label line", line
#means we are on a read line
currentread = line
else:
label_line = True
#We are on a sequence line for currentread.
currentlength = len(line)
print 'cl', currentlength
print top_read_line_length
if top_read_line_length < currentlength:
print 'trl', topreadline
topreadline = currentread #now topreadline label is the "_line" string
top_read_line_length = currentlength
print 'trl', topreadline
#go to next line
print "Done"
print "Longest line is...."
print topreadline
我添加了label_line
布尔值来在标签行和数据行之间来回切换,但重要的部分是:
问题发生在上一个if
套件中 - 您正在检查top_read_line_length
但是正在设置topreadlinelength
(没有下划线)。