Question

我的项目陷入困境，无法克服这个困难。我希望别人帮助我解决这个问题：

我有一个字符串，在该字符串内部有一些令牌文本，我想手动将它们取出并将它们放入字符串的数组列表中。最终结果可能有两个数组列表，一个是普通文本，另一个是令牌文本。下面是一个字符串示例，其中包含一些由开放标记“[[”和close tag“]]”包围的标记。

通过将淀粉源与热水混合制备麦芽汁的第一步称为[[Textarea]]。将热水与压碎的麦芽或麦芽混合在一起。糖化过程需要[[CheckBox]]，在此期间淀粉转化为糖，然后甜麦芽从谷物中排出。现在，在称为[[Radio]]的过程中洗涤谷物。这种洗涤使酿造者尽可能地从谷物中收集[[DropDownList]]可发酵液体。

操作字符串后得到两个数组列表：

结果：

Normal Text ArrayList { "The first step, where the wort is prepared by mixing the starch source with hot water, is known as ", ". Hot water is mixed with crushed malt or malts in a mash tun. The mashing process takes around ", ", during which the starches are converted to sugars, and then the sweet wort is drained off the grains. The grains are now washed in a process known as ", ". This washing allows the brewer to gather ", " the fermentable liquid from the grains as possible." }

Token Text ArrayList { "[[Textarea]]", "[[CheckBox]]", "[[Radio]]", "[[DropDownList]]" }

两个数组列表，一个是普通文本数组列表，有5个元素，它们是令牌之前或之后的文本，另一个是令牌文本数组列表，有4个元素，它们是字符串中的标记文本。

这项工作可以完成切割和子串的哪种技术，但对于长文本来说太难了，很容易得到错误，有些时候无法得到我想要的东西。如果在这个问题上有一些帮助，请在C＃中发帖，因为我使用C＃来完成这项任务。

Answer 1

这似乎可以完成这项工作（虽然请注意，目前我的tokens数组包含普通代币，而不是用[[和]]包裹它们：

var inp = @"The first step, where the wort is prepared by mixing the starch source with hot water, is known as [[Textarea]]. Hot water is mixed with crushed malt or malts in a mash tun. The mashing process takes around [[CheckBox]], during which the starches are converted to sugars, and then the sweet wort is drained off the grains. The grains are now washed in a process known as [[Radio]]. This washing allows the brewer to gather [[DropDownList]] the fermentable liquid from the grains as possible.";

var step1 = inp.Split(new string[] { "[[" }, StringSplitOptions.None);
//step1 should now contain one string that's due to go into normal, followed by n strings which need to be further split
var step2 = step1.Skip(1).Select(a => a.Split(new string[] { "]]" }, StringSplitOptions.None));
//step2 should now contain pairs of strings - the first of which are the tokens, the second of which are normal strings.

var normal = step1.Take(1).Concat(step2.Select(a => a[1])).ToArray();
var tokens = step2.Select(a => a[0]).ToArray();

这也假设输入中没有不平衡的[[和]]序列。

进入此解决方案的观察结果：如果您要先在原始文本中的每个[[对周围拆分字符串，那么第一个输出字符串已经生成。此外，第一个字符串之后的每个字符串都包含一个标记，]]对和一个普通文本。例如。 step1中的第二个结果是：“Textarea]]。热水与麦芽捣碎的麦芽或麦芽混合在一起。糖化过程需要”

因此，如果您在]]对周围分割这些其他结果，则第一个结果是一个标记，第二个结果是一个普通字符串。

获取字符串中的标记块

1 个答案: