Question

我有两个文件，其中包含网站列表。主文件名为A.txt，处理后的文件名为B.txt

A.txt的内容

www.cnn.com
www.google.com
www.gmail.com
www.iamvishal.com

B.txt的内容

www.cnn.com
www.google.com

我想创建一个python脚本来比较或搜索文件并创建一个new.txt，它只包含A.txt中尚未出现在B.txt中的网站

我是新手，我在这个网站上做了很多阅读，并找到了一些很好的例子。我已经设法完成工作，但我担心我的逻辑是错误的。请参阅以下代码：

processedfile = open("b.txt")
masterfile = open("a.txt")
f=open("new.txt","w")

for line in processedfile.readlines():
  line = line.strip()
  print line;
  print "We are printing the processed part"
  for linetwo in masterfile.readlines():
     linetwo= linetwo.strip()
     print linetwo
     print "we are printing the master part"
     if linetwo != line:
            f.write(linetwo+"\n")

因此，新文件new.txt包含A.txt中但不在B.txt中的所有网站，这些网站都是第一个条目。我也担心还有其他逻辑问题，因为我将B.txt与A.txt保持在同一序列中，因此如果站点不按顺序排列，代码很容易中断。

new.txt的内容

www.google.com
www.gmail.com
www.iamvishal.com

请告知我现在应该如何解决这个问题。

Answer 1

将文件读入两个set并使用设置差异。例如：

a = set(line.strip() for line in open('a.txt', 'r'))
b = set(line.strip() for line in open('b.txt', 'r'))

new = open('new.txt', 'w')
new.write('\n'.join(a - b))

Answer 2

如果文件很小，您可以使用集合来简化代码：

master = set(line.strip() for line in open('a.txt'))
processed = set(line.strip() for line in open('b.txt'))
for name in master - processed:
  print name

Answer 3

a = set(open("a.txt"))
b = set(open("b.txt"))
new = open("new.txt", "w")
new.write("".join(a - b))

Answer 4

Python中有一个名为difflib的库可以为你做很多这方面的工作。以下是使用它的示例：

# find the difference between two texts
# tested with Python24   vegaseat  6/2/2005

import difflib

text1 = """The World's Shortest Books:
Human Rights Advances in China
"My Plan to Find the Real Killers" by OJ Simpson
"Strom Thurmond:  Intelligent Quotes"
America's Most Popular Lawyers
Career Opportunities for History Majors
Different Ways to Spell "Bob"
Dr. Kevorkian's Collection of Motivational Speeches
Spotted Owl Recipes by the EPA
The Engineer's Guide to Fashion
Ralph Nader's List of Pleasures
"""

text2 = """The World's Shortest Books:
Human Rights Advances in China
"My Plan to Find the Real Killers" by OJ Simpson
"Strom Thurmond:  Intelligent Quotes"
America's Most Popular Lawyers
Career Opportunities for History Majors
Different Ways to Sell "Bob"
Dr. Kevorkian's Collection of Motivational Speeches
Spotted Owl Recipes by the EPA
The Engineer's Guide to Passion
Ralph Nader's List of Pleasures
"""

# create a list of lines in text1
text1Lines = text1.splitlines(1)
print "Lines of text1:"
for line in text1Lines:
  print line,

print

# dito for text2
text2Lines = text2.splitlines(1)
print "Lines of text2:"
for line in text2Lines:
  print line,

print  

diffInstance = difflib.Differ()
diffList = list(diffInstance.compare(text1Lines, text2Lines))

print '-'*50
print "Lines different in text1 from text2:"
for line in diffList:
  if line[0] == '-':
    print line,

来源： http://www.daniweb.com/software-development/python/threads/96638

在python中搜索和删除数据

4 个答案: