Question

我正在尝试拆分字符串

Change 709131 on 2014/06/05 by person1

    - some description

Change 709081 on 2014/06/05 by person2

    more description

Change 708930 on 2014/06/04 by person3

    description xyz


Change 708906 on 2014/06/04 by person4

    description of change

我想从Change \d+分开（这意味着更改709081等）。

我正在尝试使用

set abc [regexp -inline -all {Change \d+\son.*Change \d+\son} $oIfs]

我没有得到所需的输出

编辑：我发现的一种方式是

set abc [regexp -inline -all {Change.*?(?=Change)} $oIfs]

但它没有给出声明的最后部分。

Answer 1

你可以试试这个结构：

Change \d+(?:(?!\mChange\M).)+

(?:(?!Change).)+将匹配除Change之外的任何字符。

codepad demo

Answer 2

Tcllib救援：http://tcllib.sourceforge.net/doc/textutil_split.html

package require textutil::split

set s {Change 709131 on 2014/06/05 by person1

    - some description

Change 709081 on 2014/06/05 by person2

    more description

Change 708930 on 2014/06/04 by person3

    description xyz


Change 708906 on 2014/06/04 by person4

    description of change}

foreach {chg desc} [lrange [textutil::split::splitx $s {(Change \d+)}] 1 end] {lappend changes "$chg$desc"}

set i 0
foreach chg $changes {puts "[incr i]> $chg"}

1> Change 709131 on 2014/06/05 by person1

    - some description


2> Change 709081 on 2014/06/05 by person2

    more description

3> Change 708930 on 2014/06/04 by person3

    description xyz



4> Change 708906 on 2014/06/04 by person4

    description of change

Answer 3

解决问题的一种方法是逐行处理数据并构建“记录”。当您遇到记录的开头时，对先前的记录执行某些操作，然后重置（即构建新记录）。以下是一些建议的代码：

set data {Change 709131 on 2014/06/05 by person1

    - some description

Change 708906 on 2014/06/04 by person4

    description of change
}

proc do_something {record} {
    # Process a record, in this case, just print it out with separators
    if {[llength $record] == 0} { return }

    puts "----------------"
    foreach line $record {
        puts $line
    }
}

set record [list]
foreach line [split $data \n] {
    if {[regexp {^Change \d+} $line]} {
        # Encounter the start of a record, process the previous record
        # and start a new record
        do_something $record
        set record [list]
    }
    lappend record "$line"
}

# Process the last record
if {[llength $record] != 0} { do_something $record }

Answer 4

这是一个棘手的正则表达式，但它适用于您的示例数据：

regexp -all -inline {(?w)^Change.*?(?:\Z|\n(?=Change))} $sampleData

看看RE本身的各个部分：

(?w)             # "Weird" mode; ^ and $ are line anchored but . matches newlines
^Change          # "Change" at the start of a line...
.*?              # and as few extra characters as possible, until...
(?:              #   (start non-capturing group)
  \Z             # ... the end of the whole string...
|                # or...
  \n             # ... newline, followed by...
  (?=Change)     # ... "Change" (as zero-width lookahead)
)                #   (end non-capturing group)

使用您的样本数据：

% regexp -all -inline {(?w)^Change.*?(?:\Z|\n(?=Change))} $sampleData
{Change 709131 on 2014/06/05 by person1

    - some description

} {Change 709081 on 2014/06/05 by person2

    more description

} {Change 708930 on 2014/06/04 by person3

    description xyz


} {Change 708906 on 2014/06/04 by person4

    description of change}

对我来说还不错。假设没有人将“Change”直接放在描述中的行首。

在tcl中匹配正则表达式时出现问题

4 个答案: