Question

我有一个PDF文件，我使用在线工具转换为.txt。现在我想解析其中的数据并使用正则表达式将其拆分。我差不多完成但是坚持了一点。

数据示例如下：

00 41 53 Bid Form – Design/Build (Single-Prime Contract)

27 05 13.23 T1 Services

我希望将其拆分为：00 41 53 Bid Form – Design/Build (Single-Prime Contract)，其他是27 05 13.23 T1 Services

我正在使用的正则表达式为[0-9](\d|\ |\.)*(\D)*

它可以包含带空格和/或点的数字，然后是文字，可以是（字母，点，逗号，(，)，-和数字。

如果字符串中包含数字，就像上面的“T1服务”一样，我无法匹配。

Answer 1

如果我理解正确，你会尝试用换行符分割。这是在C＃中。

string[] Result = Regex.Split(inputText, "[\r\n]+");

Answer 2

你也可以用正则表达式完成它像这样：

string phrase = ".......\n,,,,.ll..\r\n....";
string[] words;

words = phrase.Split(new string []{"\n","\r"}), StringSplitOptions.RemoveEmptyEntries);

如果您只想使用正则表达式，请使用@mhasan解决方案。

正则表达式在PDF文件中拆分文本

2 个答案: