我有一个.txt文件,其中有两个单词在不同的行中重复。
这是一个例子。 (实际的是大约80,000行)
ANS
ANS
ANS
AUT
AUT
AUT
AUT
ANS
ANS
ANS
ANS
ANS
我正在尝试开发一些Python代码来计算连续的行并返回它们重复的次数。因此,对于此示例,我想将[3,4,5]返回到另一个.txt文件
word="100011010"
count=1
length=""
for i in range(1, len(word)):
if word[i-1] == word[i]:
count += 1
else:
length += word[i-1]+" repeats "+str(count)+", "
count=1
length += ("and "+word[i]+" repeats "+str(count))
print (length)
该概念类似于上面的字符串代码。有没有办法用清单做到这一点?
答案 0 :(得分:2)
您可以这样阅读整个文件:
content = []
with open('/path/to/file.txt', 'r') as file
content = file.readlines()
#Maybe you want to strip the lines
#content = [line.strip() for line in file.readlines()]
这里有一个包含文件所有行的列表
def count_consecutive_lines(lines):
counter = 1
output = ''
for index in range(1, len(lines)):
if lines[index] != lines[index-1]:
output += '{} repeats {} times.\n'.format(lines[index], counter)
counter = 1
counter += 1
return output
并称之为
print(count_consecutive_lines(content))
答案 1 :(得分:1)
没有将整个文件加载到内存中的答案:
last = None
count = 0
result = []
with open('sample.txt', 'rb') as f:
for line in f:
line = line.strip()
if line == last:
count = count + 1
else:
if count > 0:
result.append(count)
count = 1
last = line
result.append(count)
print result
结果:
[3, 4, 5]
<强>更新强>
该列表包含整数,您只能join
个字符串,因此您必须将其转换。
outFile.write('\n'.join(str(n) for n in result))
答案 2 :(得分:0)
您可以尝试将文件数据转换为列表,并按照以下方法进行操作:
with open("./sample.txt", 'r') as fl:
fl_list = list(fl)
unique_data = set(fl_list)
for unique in unique_data:
print "%s - count: %s" %(unique, fl_list.count(unique))
#output:
ANS - count: 8
AUT - count: 4
答案 3 :(得分:0)
打开您的文件并将其读取以计算:
l=[]
last=''
with open('data.txt', 'r') as f:
data = f.readlines()
for line in data:
words = line.split()
if words[0]==last:
l[-1]=l[-1]+1
last=words[0]
else:
l.append(1)
if last=='':
last=words[0]
答案 4 :(得分:0)
这是您的预期输出:)
with open("./sample.txt", 'r') as fl:
word = list(fl)
count=1
length=[]
for i in range(1, len(word)):
if word[i-1] == word[i]:
count += 1
else:
length.append(count)
count=1
length.append(count)
print (length)
#output as you excpect:
[3, 4, 5]