我有两个文本文件,包含这样的数据
我希望这能在hadoop中完成。任何人都可以建议我的方式吗? textfile1 - > 1 goerge hyder 2 ganesh新加坡
textfile2 --> 1 goergy hydel
2 ganest singapore
它必须按列和字符进行比较,所以在比较之后它应该作为报告
column_name source destiny mismatch
xxx george georgy y
ganesh ganest h
hyder hydel r
请帮助我。
答案 0 :(得分:0)
f = open('textfile1.txt', 'a').readlines()
for n in f:
text1 = n.rstrip()
n = open('textfile2.txt', 'a').readlines()
for l in n:
text2 = l.rstrip()
if text1 == text2:
print("It Is the Same Thing")
report = open('report.txt')
report.write('It is The Same Thing with the text 1 and 2')
report.write('\n')
else:
print("it Is Not The Same Thing")
report = open('report.txt')
report.write('It is Not The Same Thign With the text 1 and 2')
report.write('\n')
答案 1 :(得分:0)
with open(textfile1,"r") as f1:
with open(textfile2,"r") as f2:
words1 = f1.read().split(" ")
words2 = f2.read().split(" ")
#considering f1 and f2 have the same number of words
for i in range(len(words1)):
if words1[i] != words2[i]:
for j in range(len(words1[i])):
if words1[i][j] != words2[i][j]:
print(words1[i],words2[i],words2[i][j])
答案 2 :(得分:0)
如上所述Seer.The,您可以使用difflib
。
import difflib
# Read the files
f = open('textfile1.txt', 'r').readlines()
list1 = []
for n in f:
text = n.rstrip().split(" ")
list1.append(text)
f = open('textfile2.txt', 'r').readlines()
list2 = []
for n in f:
text = n.rstrip().split(" ")
list2.append(text)
# Get the output
for ii in range(len(list1)):
for jj in range(len(list1[0])):
output_list = [li[-1]
for li in list(difflib.ndiff(list1[ii][jj], list2[ii][jj]))
if "-" in li]
if output_list == []:
output_list = ["no difference"]
print "{} {} {}".format(list1[ii][jj], list2[ii][jj], output_list[0])
输出应如下所示:
goerge goergy e
hyder hydel r
ganesh ganest h
singapore singapore no difference