我是python的新手。我有一个文本文件,我需要避免冗余而不是删除,但如果发现行相同,则通过增加文本文件中的数字。
请帮忙!答案将不胜感激! 例如随机文本文件:
hello ram1
hello ram1
hello gate1
hello gate1
预期产出:
hello ram1
hello ram2
hello gate1
hello gate2
答案 0 :(得分:2)
使用正则表达式和collections.defaultdict
:
from collections import defaultdict
import re
numbers = defaultdict(int)
with open('/path/to/textfile.txt') as f:
for line in f:
line = re.sub(r'\d+', '', line.rstrip()) # Remove numbers.
numbers[line] += 1 # Increment number for the same line
print('{}{}'.format(line, numbers[line]))
UPDATE 使用切片表示法,字典。
import re
numbers = {}
with open('1.txt') as f:
for line in f:
row = re.split(r'(\d+)', line.strip())
words = tuple(row[::2]) # Extract non-number parts to use it as key
if words not in numbers:
numbers[words] = [int(n) for n in row[1::2]] # extract number parts.
numbers[words] = [n+1 for n in numbers[words]] # Increase numbers.
row[1::2] = map(str, numbers[words]) # Assign back numbers
print(''.join(row))
答案 1 :(得分:0)
import re
seen = {}
#open file
f = open('1.txt')
#read through file
for line in f:
#does the line has anything?
if len(line):
#regex, for example, matching "(hello [space])(ram or gate)(number)"
matched = re.match(r'(.*\s)(.*)(\d)',line)
words = matched.group(1) #matches hello space
key = matched.group(2) #matches anything before number
num = int(matched.group(3)) #matches only the number
if key in seen:
# see if { ram or gate } exists in seen. add 1
seen[key] = int(seen[key]) + 1
else:
# if { ram or gate } does not exist, create one and assign the initial number
seen[key] = num
print('{}{}{}'.format(words,key,seen[key]))