Question

我有一个包含以下文本的文本文件：

 andal-4.1.0.jar
 besc_2.1.0-beta
 prov-3.0.jar
 add4lib-1.0.jar
 com_lab_2.0.jar
 astrix
 lis-2_0_1.jar

有什么办法可以使用正则表达式拆分名称和版本吗？我想使用结果在Excel中创建两列“名称”和“版本”。所以我希望正则表达式的结果看起来像

andal          4.1.0.jar
besc           2.1.0-beta
prov           3.0.jar
add4lib        1.0.jar
com_lab        2.0.jar
astrix
lis            2_0_1.jar

到目前为止，我已经分别使用^(?:.*-(?=\d)|\D+)获取版本和-\d.*$获取名称。问题是当我对一个大文本文件执行此操作时，两个正则表达式的结果顺序不同。那么，有什么方法可以像我上面提到的那样获得结果吗？

Answer 1

Ctrl + H
查找内容：^(.+?)[-_](\d.*)$
替换为：$1\t$2
检查环绕
检查正则表达式
取消检查. matches newline
全部替换

说明：

^           # beginning of line
    (.+?)   # group 1, 1 or more any character but newline, not greedy
    [-_]    # a dash or underscore
    (\d.*)  # group 2, a digit then 0 or more any character but newline
$           # end of line

替换：

$1          # content of group 1
\t          # a tabulation, you may replace with what you want
$2          # content of group 2

给定示例的结果

 andal  4.1.0.jar
 besc   2.1.0-beta
 prov   3.0.jar
 add4lib    1.0.jar
 com_lab    2.0.jar
 astrix
 lis    2_0_1.jar

Answer 2

不太确定大文件问题的含义，我相信您显示的两个正则表达式与您所说的相反：第一个应为您提供名称，第二个应为您提供版本。

无论如何，这是我必须猜测对您有意义的假设：

“名称”后可以跟-或_，然后是版本字符串。
“版本”字符串以-或_开头，带有一些数字，后跟一个点或下划线，然后是某个数字，然后是任何字符串。

如果这些假设有意义，则可以使用

^(.+?)(?:[-_](\d+[._]\d+.*))?$

作为您的正则表达式。组1为名称，组2为版本。

regex101中的演示：https://regex101.com/r/RnwMaw/3

正则表达式的解释

^                                   start of line
 (.+?)                              "Name" part, using reluctant match of 
                                      at least 1 character
      (?:                   )?   Optional group of "Version String", which
                                      consists of:
         [-_]                       - or _
             (             )         Followed by the "Version" , which is 
              \d+                      at least 1 digit, 
                 [._]                  then 1 dot or underscore, 
                     \d+               then at least 1 digit,
                        .*             then any string
                              $   end of line

文本文件的正则表达式

2 个答案: