已删除行

时间:2017-07-29 08:44:07

标签: python

如何从字符串中删除重复的行,然后打印已删除的行数?

我明白了:

import os


sentence = """Sentence1
Sentence1
Sentence2
Sentence3
Sentence4
Sentence4"""


spaces = sentence.replace(" ", "\n") #Makes one word per line
lines = os.linesep.join([s for s in spaces.splitlines() if s]) #Removes empty lines
duplicate = "\n".join(set(lines.split('\n'))) #Removes duplicate lines


numberlines = len(duplicate.split('\n')) #Counts lines



print(duplicate)
print'Lines:', numberlines

这样,输出为:

Sentence4
Sentence1
Sentence2
Sentence3
Lines: 4

如何实现此输出:

Sentence4
Sentence1
Sentence2
Sentence3
Lines: 4
Removed Lines: 2

谢谢:D

3 个答案:

答案 0 :(得分:1)

您可以使用set

Removed_lines = len(lines.split("\n")) - len(set(lines.split("\n")))

答案 1 :(得分:1)

让我们逐行分析您的代码:

spaces = sentence.replace(" ", "\n") #Makes one word per line

到目前为止,非常好。

lines = os.linesep.join([s for s in spaces.splitlines() if s]) #Removes empty lines

好的,所以你删除空行,但最好将结果保留为列表,而不是将它们粘合到一个字符串中,因为...:

duplicate = "\n".join(set(lines.split('\n'))) #Removes duplicate lines

...在这里你再次拆分它,再次将结果加入一个字符串......

numberlines = len(duplicate.split('\n')) #Counts lines

...只是再分开一次。更好的版本:

spaces = sentence.split()                 # Makes one word per line
lines = [s for s in spaces if s]          # Removes empty lines
duplicate = set(lines)                    # Removes duplicate lines
numberlines = len(duplicate)              # Counts lines
removed_lines = len(lines) - numberlines
print '\n'.join(duplicate)
print 'Lines:', numberlines
print 'Removed:', removed_lines

答案 2 :(得分:0)

import os



sentence = """Sentence1
Sentence1
Sentence2
Sentence3
Sentence4
Sentence4"""



spaces = sentence.replace(" ", "\n") 
lines = os.linesep.join([s for s in spaces.splitlines() if s]) 
duplicate = "\n".join(set(lines.split('\n'))) 

numberlinesprev = len(sentence.split('\n'))
num1 = int(numberlinesprev)

numberlines = len(duplicate.split('\n'))
num2 = int(numberlines)

sum = num1 - num2



print(duplicate)
print'Lines Removed:', sum
print'Lines:', numberlines

输出:

Sentence4
Sentence1
Sentence2
Sentence3
Lines Removed: 2
Lines: 4