说我有以下文本块:
ONE asd blah| 1| 123| 222| -0.03| -62333| -2253| -121.26| -1120.12| XCT
TWO Three
Nine Twelve
Twenty
DDD
ONE ads blah| 42| 555| 5423| -345| -5422| -399815| -345| -345| XCT
TWO Three
Six Seven
Twenty
DDD
现在,我想找到具有以下所有内容的文本块:
ONE, TWO, Three, Nine, Twelve, Twenty
这应匹配第一个块而不是第二个
然后,同样地:
ONE, TWO, Three, Six, Seven, Twenty
匹配第二个区块但不匹配第一个区块。
我怎样才能做到这一点?
我尝试使用以下内容搜索ONE
但不包括下一个ONE
的所有文字:
ONE((.|\n)*)(?=^ONE)
作为一个开始,但即使这样也行不通!
答案 0 :(得分:1)
既然你说这些术语必须按顺序发生,那很简单:
ONE(?:(?!ONE).)*?TWO(?:(?!ONE).)*?Three(?:(?!ONE).)*?Nine(?:(?!ONE).)*?Twelve(?:(?!ONE).)*?Twenty(?:(?!ONE).)*
匹配第一个块但不匹配第二个块。测试live on regex101.com。
<强>解释强>
(?:(?!ONE).)*?
匹配任意数量的字符,除非它们位于短语ONE
的开头。这可以确保您不会跨越到不同的块中。
确保使用RegexOptions.Singleline
编译正则表达式,以便点匹配换行符。
答案 1 :(得分:0)
(?=.*?\bONE\b)(?=.*?\bTWO\b)(?=.*?\bThree\b)(?=.*?\bNine\b)(?=.*?\bTwelve\b)(?=.*?\bTwenty\b).*\n\n
匹配您的第一个区块。以单行模式应用(大多数正则表达式文字符号中的修饰符s
,或构造正则表达式对象时的标志)。
这是在.*\n\n
最终匹配块之前必须满足的条件列表(以任何顺序)。每个条件都是正向前瞻,可以搜索单个单词。
请参阅:https://regex101.com/r/sC4tR1/1
这不是“完美”,因为没有块边界检测。如果块边界在字符串中是常规的,则可以展开表达式以合并它们。
另一种策略是将字符串预先拆分为单独的块,然后在这些块上运行表达式而不是整个字符串。
答案 2 :(得分:0)
我已经解析了这样的文字40年了。通常不能使用正则表达式。尝试以下代码
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.IO;
namespace ConsoleApplication3
{
class Program
{
const string FILENAME = @"c:\temp\test.txt";
static void Main(string[] args)
{
StreamReader reader = new StreamReader(FILENAME);
string inputLine = "";
Block block = null;
while ((inputLine = reader.ReadLine()) != null)
{
inputLine = inputLine.Trim();
if (inputLine.Length > 0)
{
if (inputLine.StartsWith("ONE"))
{
block = new Block();
Block.blocks.Add(block);
}
block.lines.Add(inputLine);
}
}
}
}
public class Block
{
public static List<Block> blocks = new List<Block>();
public List<string> lines { get; set; }
public Block()
{
lines = new List<string>();
}
}
}
答案 3 :(得分:0)
您是否尝试从特定文本结构中提取特定单词(正则表达式测试/匹配),或者您是否尝试查看给定文本中是否包含特定单词(因为您似乎知道要查找哪些单词)< / p>
如果是后者,AhoCorasic怎么样?
我过去曾经用过这个。这是一种非常非常快速的搜索特定字符串集文本的算法。
// Copyright (c) 2013 Pēteris Ņikiforovs
//
// Permission is hereby granted, free of charge, to any person obtaining a copy
// of this software and associated documentation files (the "Software"), to deal
// in the Software without restriction, including without limitation the rights
// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
// copies of the Software, and to permit persons to whom the Software is
// furnished to do so, subject to the following conditions:
//
// The above copyright notice and this permission notice shall be included in
// all copies or substantial portions of the Software.
//
// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
// THE SOFTWARE.
using System.Collections;
using System.Collections.Generic;
/// <summary>
/// Trie that will find and return strings found in a text.
/// </summary>
public class Trie : Trie<string>
{
public Trie(){}
public Trie(IEnumerable<string> source)
{
Add(source);
Build();
}
/// <summary>
/// Adds a string.
/// </summary>
/// <param name="s">The string to add.</param>
public void Add(string s)
{
Add(s, s);
}
/// <summary>
/// Adds multiple strings.
/// </summary>
/// <param name="strings">The strings to add.</param>
public void Add(IEnumerable<string> strings)
{
foreach (string s in strings)
{
Add(s);
}
}
}
/// <summary>
/// Trie that will find strings in a text and return values of type <typeparamref name="T"/>
/// for each string found.
/// </summary>
/// <typeparam name="TValue">Value type.</typeparam>
public class Trie<TValue> : Trie<char, TValue>
{
}
/// <summary>
/// Trie that will find strings or phrases and return values of type <typeparamref name="T"/>
/// for each string or phrase found.
/// </summary>
/// <remarks>
/// <typeparamref name="T"/> will typically be a char for finding strings
/// or a string for finding phrases or whole words.
/// </remarks>
/// <typeparam name="T">The type of a letter in a word.</typeparam>
/// <typeparam name="TValue">The type of the value that will be returned when the word is found.</typeparam>
public class Trie<T, TValue>
{
/// <summary>
/// Root of the trie. It has no value and no parent.
/// </summary>
private readonly Node<T, TValue> root = new Node<T, TValue>();
/// <summary>
/// Adds a word to the tree.
/// </summary>
/// <remarks>
/// A word consists of letters. A node is built for each letter.
/// If the letter type is char, then the word will be a string, since it consists of letters.
/// But a letter could also be a string which means that a node will be added
/// for each word and so the word is actually a phrase.
/// </remarks>
/// <param name="word">The word that will be searched.</param>
/// <param name="value">The value that will be returned when the word is found.</param>
public void Add(IEnumerable<T> word, TValue value)
{
// start at the root
var node = root;
// build a branch for the word, one letter at a time
// if a letter node doesn't exist, add it
foreach (T c in word)
{
var child = node[c];
if (child == null)
child = node[c] = new Node<T, TValue>(c, node);
node = child;
}
// mark the end of the branch
// by adding a value that will be returned when this word is found in a text
node.Values.Add(value);
}
/// <summary>
/// Constructs fail or fall links.
/// </summary>
public void Build()
{
// construction is done using breadth-first-search
var queue = new Queue<Node<T, TValue>>();
queue.Enqueue(root);
while (queue.Count > 0)
{
var node = queue.Dequeue();
// visit children
foreach (var child in node)
queue.Enqueue(child);
// fail link of root is root
if (node == root)
{
root.Fail = root;
continue;
}
var fail = node.Parent.Fail;
while (fail[node.Word] == null && fail != root)
fail = fail.Fail;
node.Fail = fail[node.Word] ?? root;
if (node.Fail == node)
node.Fail = root;
}
}
/// <summary>
/// Finds all added words in a text.
/// </summary>
/// <param name="text">The text to search in.</param>
/// <returns>The values that were added for the found words.</returns>
public IEnumerable<TValue> Find(IEnumerable<T> text)
{
var node = root;
foreach (T c in text)
{
while (node[c] == null && node != root)
node = node.Fail;
node = node[c] ?? root;
for (var t = node; t != root; t = t.Fail)
{
foreach (TValue value in t.Values)
yield return value;
}
}
}
/// <summary>
/// Node in a trie.
/// </summary>
/// <typeparam name="TNode">The same as the parent type.</typeparam>
/// <typeparam name="TNodeValue">The same as the parent value type.</typeparam>
private class Node<TNode, TNodeValue> : IEnumerable<Node<TNode, TNodeValue>>
{
private readonly TNode word;
private readonly Node<TNode, TNodeValue> parent;
private readonly Dictionary<TNode, Node<TNode, TNodeValue>> children = new Dictionary<TNode, Node<TNode, TNodeValue>>();
private readonly List<TNodeValue> values = new List<TNodeValue>();
/// <summary>
/// Constructor for the root node.
/// </summary>
public Node()
{
}
/// <summary>
/// Constructor for a node with a word
/// </summary>
/// <param name="word"></param>
/// <param name="parent"></param>
public Node(TNode word, Node<TNode, TNodeValue> parent)
{
this.word = word;
this.parent = parent;
}
/// <summary>
/// Word (or letter) for this node.
/// </summary>
public TNode Word
{
get { return word; }
}
/// <summary>
/// Parent node.
/// </summary>
public Node<TNode, TNodeValue> Parent
{
get { return parent; }
}
/// <summary>
/// Fail or fall node.
/// </summary>
public Node<TNode, TNodeValue> Fail
{
get;
set;
}
/// <summary>
/// Children for this node.
/// </summary>
/// <param name="c">Child word.</param>
/// <returns>Child node.</returns>
public Node<TNode, TNodeValue> this[TNode c]
{
get { return children.ContainsKey(c) ? children[c] : null; }
set { children[c] = value; }
}
/// <summary>
/// Values for words that end at this node.
/// </summary>
public List<TNodeValue> Values
{
get { return values; }
}
/// <inherit/>
public IEnumerator<Node<TNode, TNodeValue>> GetEnumerator()
{
return children.Values.GetEnumerator();
}
/// <inherit/>
IEnumerator IEnumerable.GetEnumerator()
{
return GetEnumerator();
}
/// <inherit/>
public override string ToString()
{
return Word.ToString();
}
}
}