python中两个文件的比较

时间:2018-05-07 09:17:17

标签: python

我有两个文本文件,包含这样的数据

我希望这能在hadoop中完成。任何人都可以建议我的方式吗?     textfile1 - > 1 goerge hyder                   2 ganesh新加坡

textfile2 --> 1 goergy hydel
              2 ganest singapore

它必须按列和字符进行比较,所以在比较之后它应该作为报告

column_name source destiny mismatch
      xxx    george georgy y
             ganesh ganest h
             hyder  hydel  r

请帮助我。

3 个答案:

答案 0 :(得分:0)

f = open('textfile1.txt', 'a').readlines()
for n in f:
    text1 = n.rstrip()
n = open('textfile2.txt', 'a').readlines()
for l in n:
    text2 = l.rstrip()
if text1 == text2:
   print("It Is the Same Thing")
   report = open('report.txt')
   report.write('It is The Same Thing with the text 1 and 2')
   report.write('\n')
else:
   print("it Is Not The Same Thing")
   report = open('report.txt')
   report.write('It is Not The Same Thign With the text 1 and 2')
   report.write('\n')

答案 1 :(得分:0)

with open(textfile1,"r") as f1:
    with open(textfile2,"r") as f2:

        words1 = f1.read().split(" ")
        words2 = f2.read().split(" ")


        #considering f1 and f2 have the same number of words
        for i in range(len(words1)):

            if words1[i] != words2[i]:

                for j in range(len(words1[i])):

                    if words1[i][j] != words2[i][j]:

                        print(words1[i],words2[i],words2[i][j])

答案 2 :(得分:0)

如上所述Seer.The,您可以使用difflib

import difflib

# Read the files
f = open('textfile1.txt', 'r').readlines()
list1 = []
for n in f:
    text = n.rstrip().split(" ")
    list1.append(text)


f = open('textfile2.txt', 'r').readlines()
list2 = []
for n in f:
    text = n.rstrip().split(" ")
    list2.append(text)

# Get the output
for ii in range(len(list1)):
    for jj in range(len(list1[0])):
        output_list = [li[-1] 
                       for li in list(difflib.ndiff(list1[ii][jj], list2[ii][jj]))
                       if "-" in li]
        if output_list == []:
            output_list = ["no difference"]
        print "{} {} {}".format(list1[ii][jj], list2[ii][jj], output_list[0])

输出应如下所示:

goerge goergy e
hyder hydel r
ganesh ganest h
singapore singapore no difference