Question

我有一个大型数据集，然后我需要在Sublime文本编辑器中使用正则表达式进行清理。

我试图删除冒号（:)之后少于5个字符的任何内容，包括空格。还试图删除超过20个字符的任何内容。

示例：

jshfdgl：JSS
oiadfgopiafdg：
ofdijgdf：2）
ogijdfogis：_ge
iognhif：gojdf sdofig peoji-009
ogijdfs：_ge 2

这些都属于正则表达式......

我也试图使用冒号后面的字母来查找小于5且大于20的字符。

尝试过很多东西，但似乎一直没有空间......

Answer 1

试试这个正则表达式：

(?<=:)(?:.{0,5}|.{20,})$

Click for Demo

用空白字符串替换匹配

<强>解释

(?<=:) - 找到紧跟:
(?:.{0,5}|.{20,})
- .{0,5} - 匹配除换行符之外的任何字符的0到5次出现
- | - 或
- .{20,} - 匹配除新行之外的任何字符的20次或更多次出现
$ - 断言字符串的结尾

Answer 2

According to the advice by @Andy G (which I support), I prepared a solution, which instead of regex, uses the following perl one-liner script (to execute from the command prompt):

perl -lan -F: -e "$len = length($F[1]); printf(qq(%s:%s\n), $F[0], ($len > 5 && $len <= 20)?$F[1]:'')" inp.txt >out.txt

Explanation:

-lan - perl options: -l - chop input line terminator, -a - auto-split mode, -n - "looping" execution.
-F: - Another perl option - define auto-split separator (:). Thanks to it, input line is split, just on ":" and the result is saved in predefined array F.
-e "..." - The program (one-liner script) to execute.
inp.txt - Input file name.
>out.txt - Output redirection.

And now move on to the script content:

$len = length($F[1]); - Save length of the second "input segment" (after ":").
printf( ... ) - Formatted print of the output line, arguments described below.
qq(%s:%s\n) - Format string. qq operator is used to embed additional double quotes around the format string, between "plain" double quotes surrounding the script content.
$F[0] - The first string to print - first "input segment" (before ":").
($len > 5 && $len <= 20)?$F[1]:'' - The second string to print. Actually it is ternary operator, decicing which string to print: If the saved length is within allowed limits then print the second "input segment" (after ":"), otherwise the instruction prints an empty string.

Due to -n option, this program is repeated for each input line.

Of course, you must have perl installed on your computer.

If you need further explanation, read about perl one-liners and maybe also about perl itself.

用于在“：”之后查找大于20或小于5的字符串的正则表达式

2 个答案: