在Python中使用正则表达式拆分diff文件

时间:2012-05-06 16:35:16

标签: python regex

我正在尝试使用python中的re模块将diff(统一格式)拆分到每个部分。差异的格式就是这样......

diff --git a/src/core.js b/src/core.js
index 9c8314c..4242903 100644
--- a/src/core.js
+++ b/src/core.js
@@ -801,7 +801,7 @@ jQuery.extend({
        return proxy;
    },

-   // Mutifunctional method to get and set values to a collection
+   // Multifunctional method to get and set values of a collection
    // The value/s can optionally be executed if it's a function
    access: function( elems, fn, key, value, chainable, emptyGet, pass ) {
        var exec,
diff --git a/src/sizzle b/src/sizzle
index fe2f618..feebbd7 160000
--- a/src/sizzle
+++ b/src/sizzle
@@ -1 +1 @@
-Subproject commit fe2f618106bb76857b229113d6d11653707d0b22
+Subproject commit feebbd7e053bff426444c7b348c776c99c7490ee
diff --git a/test/unit/manipulation.js b/test/unit/manipulation.js
index 18e1b8d..ff31c4d 100644
--- a/test/unit/manipulation.js
+++ b/test/unit/manipulation.js
@@ -7,7 +7,7 @@ var bareObj = function(value) { return value; };
 var functionReturningObj = function(value) { return (function() { return value; }); };

 test("text()", function() {
-   expect(4);
+   expect(5);
    var expected = "This link has class=\"blog\": Simon Willison's Weblog";
    equal( jQuery("#sap").text(), expected, "Check for merged text of more then one element." );

@@ -20,6 +20,10 @@ test("text()", function() {
        frag.appendChild( document.createTextNode("foo") );

    equal( jQuery( frag ).text(), "foo", "Document Fragment Text node was retreived from .text().");
+
+   var $newLineTest = jQuery("<div>test<br/>testy</div>").appendTo("#moretests");
+   $newLineTest.find("br").replaceWith("\n");
+   equal( $newLineTest.text(), "test\ntesty", "text() does not remove new lines (#11153)" );
 });

 test("text(undefined)", function() {
diff --git a/version.txt b/version.txt
index 0a182f2..0330b0e 100644
--- a/version.txt
+++ b/version.txt
@@ -1 +1 @@
-1.7.2
\ No newline at end of file
+1.7.3pre
\ No newline at end of file

我尝试了以下模式组合,但不能完全正确。这是我到目前为止最接近的......

re.compile(r'(diff.*?[^\rdiff])', flags=re.S|re.M)

但这会产生

['diff ', 'diff ', 'diff ', 'diff ']

我如何匹配此差异中的所有部分?

4 个答案:

答案 0 :(得分:1)

您不需要使用正则表达式,只需拆分文件:

diff_file = open('diff.txt', 'r')
diff_str = diff_file.read()
diff_split = ['diff --git%s' % x for x in diff_str.split('diff --git') \
              if x.strip()]
print diff_split

答案 1 :(得分:1)

这样做:

r=re.compile(r'^(diff.*?)(?=^diff|\Z)', re.M | re.S)
for m in re.findall(r, s):
    print '===='
    print m

答案 2 :(得分:0)

为什么使用正则表达式?当行以diff开始时,如何迭代这些行并开始新的部分?

list_of_diffs = []
temp_diff = ''
for line in patch:
    if line.startswith('diff'):
        list_of_diffs.append(temp_diff)
        temp_diff = ''
    else: temp_diff.append(line)

免责声明,上述代码应仅被视为说明性伪代码,预计不会实际运行。

正则表达式是一把锤子,但你的问题不是钉子。

答案 3 :(得分:0)

只需拆分后跟单词diff的任何换行符:

result = re.split(r"\n(?=diff\b)", subject)

虽然出于安全考虑,您可能应该尝试匹配\r\r\n

result = re.split(r"(?:\r\n|[\r\n])(?=diff\b)", subject)