序言
我正在使用git作为我的实验室在LaTeX中编写的论文的版本控制系统。有几个人合作。
我遇到git对它如何合并感到固执。假设两个人对一行进行了单词更改,然后尝试合并它们。虽然git diff --word-diff似乎能够逐字显示分支之间的差异,但git merge似乎无法逐字执行合并,而是需要手动合并。
使用LaTeX文档这一点特别烦人,因为编写LaTeX时的常见习惯是每行写一个完整的段落,让你的文本编辑器在为你显示时处理自动换行。我们现在正在努力为每个句子添加换行符,以便git至少可以合并段落中不同句子的更改。但它仍然会对句子中的多个变化感到困惑,这使得文本当然不再包装得很好。
问题
有没有办法git合并两个文件“逐字”而不是“逐行”?
答案 0 :(得分:14)
这是一个与sehe相同的解决方案,有一些改变,希望能够解决你的意见:
在saha的解决方案中,制作一个(或附加).gittatributes
。
*.tex filter=sentencebreak
现在实现清洁和涂抹过滤器:
git config filter.sentencebreak.clean "perl -pe \"s/[.]*?(\\?|\\!|\\.|'') /$&%NL%\\n/g unless m/%/||m/^[\\ *\\\\\\]/\""
git config filter.sentencebreak.smudge "perl -pe \"s/%NL%\n//gm\""
我已经创建了一个包含以下内容的测试文件,请注意单行段落。
\chapter{Tumbling Tumbleweeds. Intro}
A way out west there was a fella, fella I want to tell you about, fella by the name of Jeff Lebowski. At least, that was the handle his lovin' parents gave him, but he never had much use for it himself. This Lebowski, he called himself the Dude. Now, Dude, that's a name no one would self-apply where I come from. But then, there was a lot about the Dude that didn't make a whole lot of sense to me. And a lot about where he lived, like- wise. But then again, maybe that's why I found the place s'durned innarestin'.
This line has two sentences. But it also ends with a comment. % here
在我们将其提交到本地仓库后,我们可以看到原始内容。
$ git show HEAD:test.tex
\chapter{Tumbling Tumbleweeds. Intro}
A way out west there was a fella, fella I want to tell you about, fella by the name of Jeff Lebowski. %NL%
At least, that was the handle his lovin' parents gave him, but he never had much use for it himself. %NL%
This Lebowski, he called himself the Dude. %NL%
Now, Dude, that's a name no one would self-apply where I come from. %NL%
But then, there was a lot about the Dude that didn't make a whole lot of sense to me. %NL%
And a lot about where he lived, like- wise. %NL%
But then again, maybe that's why I found the place s'durned innarestin'.
This line has two sentences. But it also ends with a comment. % here
所以清理过滤器的规则是每当它找到以.
或?
或!
或''
结尾的文本字符串时(这就是做双引号)然后是空格,它会添加%NL%和换行符。但它忽略以\(乳胶命令)开头的行或在任何地方包含注释(以便注释不能成为主文本的一部分)。
涂抹过滤器删除了%NL%和换行符。
在'干净'文件上进行差异和合并,因此对段落的更改将逐句合并。这是理想的行为。
好的一点是,latex文件应该在干净或污迹状态下编译,因此协作者有一些希望不需要做任何事情。最后,您可以将git config
命令放在作为repo一部分的shell脚本中,这样协作者就必须在repo的根目录中运行它才能进行配置。
#!/bin/bash
git config filter.sentencebreak.clean "perl -pe \"s/[.]*?(\\?|\\!|\\.|'') /$&%NL%\\n/g unless m/%/||m/^[\\ *\\\\\\]/\""
git config filter.sentencebreak.smudge "perl -pe \"s/%NL%\n//gm\""
fileArray=($(find . -iname "*.tex"))
for (( i=0; i<${#fileArray[@]}; i++ ));
do
perl -pe "s/%NL%\n//gm" < ${fileArray[$i]} > temp
mv temp ${fileArray[$i]}
done
最后一点是黑客攻击,因为首次运行此脚本时,分支已经检出(以干净的形式)并且不会自动弄脏。
您可以将此脚本和.gitattributes文件添加到repo,然后新用户只需要克隆,然后在repo的根目录中运行脚本。
我认为这个脚本甚至可以在windows git上运行,如果在git bash中完成的。
缺点:
答案 1 :(得分:8)
你可以试试这个:
而不是交换合并引擎( hard ),你可以做某种“规范化”(规范化,如果你愿意的话)。我不会说LateX,但让我举例说明如下:
假设您输入了test.raw
curve ball well received {misfit} whatever
proprietary format extinction {benefit}.
您希望它逐字进行差异/合并。添加以下.gitattributes
文件
*.raw filter=wordbyword
然后
git config --global filter.wordbyword.clean /home/username/bin/wordbyword.clean
git config --global filter.wordbyword.smudge /home/username/bin/wordbyword.smudge
过滤器的极简主义实现
#!/usr/bin/perl
use strict;
use warnings;
while (<>)
{
print "$_\n" foreach (m/(.*?\s+)/go);
print '#@#DELIM#@#' . "\n";
}
#!/usr/bin/perl
use strict;
use warnings;
while (<>)
{
chomp; '#@#DELIM#@#' eq $_ and print "\n" or print;
}
提交文件后,使用`git show
检查已提交的blob的原始内容HEAD:test.raw`:
curve
ball
well
received
{misfit}
whatever
#@#DELIM#@#
proprietary
format
extinction
{benefit}.
#@#DELIM#@#
将test.raw的内容更改为
curve ball welled repreived {misfit} whatever
proprietary extinction format {benefit}.
git diff --patch-with-stat
的输出可能是你想要的:
test.raw | 6 +++---
1 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/test.raw b/test.raw
index b0b0b88..ed8c393 100644
--- a/test.raw
+++ b/test.raw
@@ -1,14 +1,14 @@
curve
ball
-well
-received
+welled
+repreived
{misfit}
whatever
#@#DELIM#@#
proprietary
-format
extinction
+format
{benefit}.
#@#DELIM#@#
你可以看到这对于合并会如何神奇地起作用,从而导致逐字衍射和合并。的 Q.E.D。强>
(我希望你喜欢我对.gitattributes的创造性使用。如果没有,我很喜欢做这个小练习)
答案 2 :(得分:3)
我相信git merge
algorithm is quite simple(即使你可以通过“耐心”合并策略让它更加努力)。
它的工作项目将继续保留。
但一般的想法是将任何细粒度的检测解决机制委托给第三方工具you can setup with git config mergetool
。
如果长行中的某些字词不同,则该外部工具(KDiff3
,DiffMerge
,...)将能够获取该更改并将其呈现给您。