Question

我有以下HTML：

<h1>Text Text</h1>      <h2>Text Text</h2>

我仍在尝试处理正则表达式，并尝试创建一个可以消除标记之间间距的句柄。

我希望最终结果是：

<h1>Text Text</h1><h2>Text Text</h2>

非常感谢任何帮助！

更新

我想删除所有空格，制表符和新行。如果我有：

<div>    <h1>Text Text</h1>      <h2>Text Text</h2>     </div>

我希望它最终成为：

<div><h1>Text Text</h1><h2>Text Text</h2></div>

Answer 1

如果只是这个特例，这里有一个合适的正则表达式来查找所有空格：

Regex regexForBreaks = new Regex(@"h1>[\s]*<h2", RegexOptions.Compiled);

但是，如果这是一个更普遍的情况，我认为正则表达式是错误的方法。例如，标签可以嵌套在其他标签中，然后您的问题需要更多细节才能找到正确的答案。正如Jamie Zawinski所说，“有些人在遇到问题时会想，'我知道，我会使用正则表达式。'现在他们有两个问题。“

Answer 2

使用正则表达式或字符串替换的一种替代方法是Html Agility包。

这是一个粗略的猜测：

/// <summary>
///  Regular expression built for C# on: Tue, Sep 1, 2009, 03:56:27 PM
///  Using Expresso Version: 3.0.2766, http://www.ultrapico.com
///  
///  A description of the regular expression:
///  
///  <h1>
///      <h1>
///  [1]: A numbered capture group. [.+]
///      Any character, one or more repetitions
///  </h1>
///      </h1>
///  Match expression but don't capture it. [\s*]
///      Whitespace, any number of repetitions
///  <h2>
///      <h2>
///  [2]: A numbered capture group. [.+]
///      Any character, one or more repetitions
///  </h2>
///      </h2>
///  
///
/// </summary>
public static Regex regex = new Regex(
      "<h1>(.+)</h1>(?:\\s*)<h2>(.+)</h2>",
    RegexOptions.Singleline
    | RegexOptions.CultureInvariant
    | RegexOptions.Compiled
    );


// This is the replacement string
public static string regexReplace = 
      "<h1>$1</h1><h2>$2</h2>";

Answer 3

如何：Regex.Replace(str, @">\s+<","><")

如何在HTML标记之间删除字符

3 个答案: