我有一个巨大的数据文件,在定义的行数后会重复一个特定的字符串。
计算前两个'Rank'出现之间的跳跃。例如,文件如下所示:
1 5 6 8 Rank line-start
2 4 8 5
7 5 8 6
5 4 6 4
1 5 7 4 Rank line-end
4 8 6 4
2 4 8 5
3 6 8 9
5 4 6 4 Rank
您可以注意到每隔3行重复一次字符串Rank。因此,对于上面的示例,块中的行数是4。我的问题是如何使用python readline()得到行数。
我目前正在关注此事:
data = open(filename).readlines()
count = 0
for j in range(len(data)):
if(data[j].find('Rank') != -1):
if count == 0: line1 = j
count = count +1
if(count == 2):
no_of_lines = j - line1
break
欢迎任何改进或建议。
答案 0 :(得分:4)
当使用.readlines()
计算行的简单生成器表达式足够时,请不要使用Rank
:
count = sum(1 for l in open(filename) if 'Rank' not in l)
'Rank' not in l
足以测试字符串中是否存在字符串'Rank'
。循环遍历打开的文件循环遍历所有行。 sum()
函数将累计为不包含1
的每一行生成的所有Rank
s,为您提供不包含Rank
的行数。
如果您需要计算从Rank
到Rank
的行,您需要一点itertools.takewhile
魔法:
import itertools
with open(filename) as f:
# skip until we reach `Rank`:
itertools.takewhile(lambda l: 'Rank' not in l, f)
# takewhile will have read a line with `Rank` now
# count the lines *without* `Rank` between them
count = sum(1 for l in itertools.takewhile(lambda l: 'Rank' not in l, f)
count += 1 # we skipped at least one `Rank` line.
答案 1 :(得分:2)
计算前两个'Rank'
次出现之间的跳转:
def find_jumps(filename):
first = True
count = 0
with open(filename) as f:
for line in f:
if 'Rank' in line:
if first:
count = 0
#set this to 1 if you want to include one of the 'Rank' lines.
first = False
else:
return count
else:
count += 1
答案 2 :(得分:1)
7行代码:
count = 0
for line in open("yourfile.txt"):
if "Rank" in line:
count += 1
if count > 1: break
elif count > 0: count += 1
print count
答案 3 :(得分:1)
我假设你想要找到一个块中的行数,其中每个块以包含'Rank'的行开头,例如,样本中有3个块:1st有4行,2nd有4行,3rd有1行:
from itertools import groupby
def block_start(line, start=[None]):
if 'Rank' in line:
start[0] = not start[0]
return start[0]
with open(filename) as file:
block_sizes = [sum(1 for line in block) # find number of lines in a block
for _, block in groupby(file, key=block_start)] # group
print(block_sizes)
# -> [4, 4, 1]
如果所有块具有相同的行数,或者您只想在第一个块中找到以'Rank'
开头的行数:
count = None
with open(filename) as file:
for line in file:
if 'Rank' in line:
if count is None: # found the start of the 1st block
count = 1
else: # found the start of the 2nd block
break
elif count is not None: # inside the 1st block
count += 1
print(count) # -> 4