我不是Python的专业知识,我试着尽力找到答案,但无法找到答案。请原谅,如果这是一个重复的问题并指出我正确的方向,如果可以的话。
我正在尝试使用Python Difflib比较两个CSV文件,并将Diff输出生成为HTML页面。当前的difflib模块将内置选项作为-m,通过突出显示差异来并排生成两个csv文件的HTML输出。
但是,difflib使用difflib.SequenceMatcher
查找差异并使用difflib.HtmlDiff.make_file
创建HTML文件。但是,它产生的输出并不是我想要的。
我目前从difflib获得的输出是: The Default Python DIFFLIB HTML output is Here.
但是,我想要的输出是:我正在寻找单词级别高亮,而不是在字符级别或序列高亮处突出显示的更改。如果在旧文件和新文件之间发生任何更改,我希望突出显示 WHOLE WORD 。
我要强调的更改是: A word Level highlight of the text.
请在这方面帮助我,是否可以使用difflib,或者我是否必须使用任何其他工具/模块。我尝试使用vimdiff和其他插件,但我没有任何东西。我对这里的任何事情持开放态度。
我使用的代码来自PythonDiffLib
文档页面。
import sys, os, time, difflib, optparse
def main():
..
..
..
n = options.lines //I used this n = ZERO.
fromfile, tofile = args # as specified in the usage string
# we're passing these as arguments to the diff function
fromdate = time.ctime(os.stat(fromfile).st_mtime)
todate = time.ctime(os.stat(tofile).st_mtime)
fromlines = open(fromfile, 'U').readlines()
tolines = open(tofile, 'U').readlines()
diff = difflib.HtmlDiff().make_file(fromlines, tolines, fromfile,
tofile, context=TRUE,
numlines=0)
# we're using writelines because diff is a generator
sys.stdout.writelines(diff)
` 的 Old.csv
refno,title,author,year,price
1001,CPP,MILTON,2008,456
1002,JAVA,Gilson,2002,456
1003,Adobe Flex,2010,566
1004,General Knowledge,Sinson,2007,465
1005,Actionscript,Gilto,2008,480
new.csv
refno,title,author,year,price
1001,CPP,MILTON,2010,456,2008
1002,JAVA,Gilson,2002
1003,Adobe Flexi,Johnson,2010,566
1004,General Knowledge,Simpson,2007,465
105,Action script,Gilto,2008,480
2000,Drama,DayoNe,,2020,560
我还在下面添加默认HTML DIFF输出和预期的HTML DIFF输出。
默认 DIFFLIB的HTML DIFF输出:
<html>
<head>
<meta http-equiv="Content-Type"
content="text/html; charset=ISO-8859-1" />
<title></title>
<style type="text/css">
table.diff {font-family:Courier; border:medium;}
.diff_header {background-color:#e0e0e0}
td.diff_header {text-align:right}
.diff_next {background-color:#c0c0c0}
.diff_add {background-color:#aaffaa}
.diff_chg {background-color:#ffff77}
.diff_sub {background-color:#ffaaaa}
</style>
</head>
<body>
<table class="diff" id="difflib_chg_to0__top"
cellspacing="0" cellpadding="0" rules="groups" >
<colgroup></colgroup> <colgroup></colgroup> <colgroup></colgroup>
<colgroup></colgroup> <colgroup></colgroup> <colgroup></colgroup>
<thead><tr><th class="diff_next"><br /></th><th colspan="2" class="diff_header">old.csv</th><th class="diff_next"><br /></th><th colspan="2" class="diff_header">new.csv</th></tr></thead>
<tbody>
<tr><td class="diff_next" id="difflib_chg_to0__0"><a href="#difflib_chg_to0__top">t</a></td><td class="diff_header" id="from0_2">2</td><td nowrap="nowrap">1001,CPP,MILTON,200<span class="diff_sub">8</span>,456</td><td class="diff_next"><a href="#difflib_chg_to0__top">t</a></td><td class="diff_header" id="to0_2">2</td><td nowrap="nowrap">1001,CPP,MILTON,20<span class="diff_add">1</span>0,456<span class="diff_add">,2008</span></td></tr>
<tr><td class="diff_next"></td><td class="diff_header" id="from0_3">3</td><td nowrap="nowrap">1002,JAVA,Gilson,2002<span class="diff_sub">,456</span></td><td class="diff_next"></td><td class="diff_header" id="to0_3">3</td><td nowrap="nowrap">1002,JAVA,Gilson,2002</td></tr>
<tr><td class="diff_next"></td><td class="diff_header" id="from0_4">4</td><td nowrap="nowrap">1003,Adobe Flex,2010,566</td><td class="diff_next"></td><td class="diff_header" id="to0_4">4</td><td nowrap="nowrap">1003,Adobe Flex<span class="diff_add">i,Johnson</span>,2010,566</td></tr>
<tr><td class="diff_next"></td><td class="diff_header" id="from0_5">5</td><td nowrap="nowrap">1004,General Knowledge,Si<span class="diff_chg">n</span>son,2007,465</td><td class="diff_next"></td><td class="diff_header" id="to0_5">5</td><td nowrap="nowrap">1004,General Knowledge,Si<span class="diff_chg">mp</span>son,2007,465</td></tr>
<tr><td class="diff_next"></td><td class="diff_header" id="from0_6">6</td><td nowrap="nowrap">1<span class="diff_sub">0</span>05,Actionscript,Gilto,2008,480</td><td class="diff_next"></td><td class="diff_header" id="to0_6">6</td><td nowrap="nowrap">105,Action<span class="diff_add"> </span>script,Gilto,2008,480</td></tr>
<tr><td class="diff_next"></td><td class="diff_header"></td><td nowrap="nowrap"></td><td class="diff_next"></td><td class="diff_header" id="to0_7">7</td><td nowrap="nowrap"><span class="diff_add">2000,Drama,DayoNe,,2020,560</span></td></tr>
</tbody>
</table>
</body>
</html>
预期 DIFFLIB的HTML DIFF输出:
<html>
<head>
<meta http-equiv="Content-Type"
content="text/html; charset=ISO-8859-1" />
<title></title>
<style type="text/css">
table.diff {font-family:Courier; border:medium;}
.diff_header {background-color:#e0e0e0}
td.diff_header {text-align:right}
.diff_next {background-color:#c0c0c0}
.diff_add {background-color:#aaffaa}
.diff_chg {background-color:#ffff77}
.diff_sub {background-color:#ffaaaa}
</style>
</head>
<body>
<table class="diff" id="difflib_chg_to0__top"
cellspacing="0" cellpadding="0" rules="groups" >
<colgroup></colgroup> <colgroup></colgroup> <colgroup></colgroup>
<colgroup></colgroup> <colgroup></colgroup> <colgroup></colgroup>
<thead><tr><th class="diff_next"><br /></th><th colspan="2" class="diff_header">old.csv</th><th class="diff_next"><br /></th><th colspan="2" class="diff_header">new.csv</th></tr></thead>
<tbody>
<tr><td class="diff_next" id="difflib_chg_to0__0"><a href="#difflib_chg_to0__top">t</a></td><td class="diff_header" id="from0_2">2</td><td nowrap="nowrap">1001,CPP,MILTON,<span class="diff_sub">2008</span>,456</td><td class="diff_next"><a href="#difflib_chg_to0__top">t</a></td><td class="diff_header" id="to0_2">2</td><td nowrap="nowrap">1001,CPP,MILTON,<span class="diff_add">2010</span>,456<span class="diff_add">,2008</span></td></tr>
<tr><td class="diff_next"></td><td class="diff_header" id="from0_3">3</td><td nowrap="nowrap">1002,JAVA,Gilson,2002<span class="diff_sub">,456</span></td><td class="diff_next"></td><td class="diff_header" id="to0_3">3</td><td nowrap="nowrap">1002,JAVA,Gilson,2002</td></tr>
<tr><td class="diff_next"></td><td class="diff_header" id="from0_4">4</td><td nowrap="nowrap">1003,<span class="diff_sub">Adobe Flex</span>,2010,566</td><td class="diff_next"></td><td class="diff_header" id="to0_4">4</td><td nowrap="nowrap">1003,<span class="diff_add">Adobe Flexi</span>,<span class="diff_add">Johnson</span>,2010,566</td></tr>
<tr><td class="diff_next"></td><td class="diff_header" id="from0_5">5</td><td nowrap="nowrap">1004,General Knowledge,<span class="diff_sub">Sinson</span>,2007,465</td><td class="diff_next"></td><td class="diff_header" id="to0_5">5</td><td nowrap="nowrap">1004,General Knowledge,<span class="diff_add">Simpson</span>,2007,465</td></tr>
<tr><td class="diff_next"></td><td class="diff_header" id="from0_6">6</td><td nowrap="nowrap"><span class="diff_sub">1005</span>,<span class="diff_sub">Actionscript</span>,Gilto,2008,480</td><td class="diff_next"></td><td class="diff_header" id="to0_6">6</td><td nowrap="nowrap"><span class="diff_add">105</span>,<span class="diff_add">Action script</span>,Gilto,2008,480</td></tr>
<tr><td class="diff_next"></td><td class="diff_header"></td><td nowrap="nowrap"></td><td class="diff_next"></td><td class="diff_header" id="to0_7">7</td><td nowrap="nowrap"><span class="diff_add">2000,Drama,DayoNe,,2020,560</span></td></tr>
</tbody>
</table>
</body>
</html>
答案 0 :(得分:0)
问题:我正在寻找单词级别的重点
实施class Comma_HtmlDiff
,将突出显示扩展为逗号边界:
您必须重载difflib.ndiff
。
注意:只展开第一个突出显示的部分已实施 如果
difflib.ndiff
在逗号中突出显示,则不会更正。
class Comma_HtmlDiff(difflib.HtmlDiff):
def __init__(self, tabsize=8, wrapcolumn=None, linejunk=None,
charjunk=difflib.IS_CHARACTER_JUNK):
setattr(difflib, '_ndiff', difflib.ndiff)
setattr(difflib, 'ndiff', self.ndiff)
super().__init__(tabsize, wrapcolumn, linejunk, charjunk)
def ndiff(self, a, b, linejunk=None, charjunk=difflib.IS_CHARACTER_JUNK):
_line = ''
for line in difflib._ndiff(a, b, linejunk, charjunk):
if line.startswith('-'):
_d = '-'
_line = line
elif line.startswith('+'):
_d = '+'
_line = line
if line.startswith('?'):
dp = line.find(_d)
if dp == -1:
_d = '+'
dp = line.find('^')
dpl = _line.rfind(',', 0, dp)
if dpl == -1:
dpl = 2
else:
dpl += 1
dpr = _line.find(',', dp)
if dpr == dp:
_d = ' '
dpl = dp
dpr = dp+1
dpw = dpr - dpl
line = line[:dpl] + _d*dpw + line[dpr:]
yield line
# Usage
diff = Comma_HtmlDiff().make_file(fromlines, tolines, fromfile,
tofile, context=True,
numlines=0)
使用Python测试:3.4.2