Question

我有这段代码：

string firstTag = "Forums2008/forumPage.aspx?forumId=";
string endTag = "</a>";
index = forums.IndexOf(firstTag, index1);

if (index == -1)
   continue;

var secondIndex = forums.IndexOf(endTag, index);

result = forums.Substring(index + firstTag.Length + 12, secondIndex - (index + firstTag.Length - 50));

我想要提取的字符串是例如：

<a href="/Forums2008/forumPage.aspx?forumId=317" title="הנקה">הנקה</a>

我想得到的是标题后面的单词：הנקה 第二个问题是，当我提取它时，我会看到一些像这样的希伯来语：��

Answer 1

执行此操作的一种有效方法是使用Regular Expressions而不是尝试查找起始位置并使用子字符串。试试这段代码，你会看到它提取锚标签的标题：

    var input = "<a href=\"/Forums2008/forumPage.aspx?forumId=317\" title=\"הנקה\">הנקה</a>";

    var expression = new System.Text.RegularExpressions.Regex(@"title=\""([^\""]+)\""");

    var match = expression.Match(input);

    if (match.Success) {
        Console.WriteLine(match.Groups[1]);
    }
    else {
        Console.WriteLine("not found");
    }

对于好奇，这是JavaScript中的一个版本：

var input = '<a href="/Forums2008/forumPage.aspx?forumId=317" title="הנקה">הנקה</a>';

var expression = new RegExp('title=\"([^\"]+)\"');

var results = expression.exec(input);

if (results) {
    document.write(results[1]);
  }
else {
  document.write("not found");
}

Answer 2

好的，这是使用String.Substring() String.Split()和String.IndexOf()

的解决方案

    String str = "<a href=\"/Forums2008/forumPage.aspx?forumId=317\" title=\"הנקה\">הנקה</a>"; // <== Assume this is passing string. Yes unusual scape sequence are added 

    int splitStart = str.IndexOf("title=");  // < Where to start splitting
    int splitEnd = str.LastIndexOf("</a>");  // < = Where to end

    /* What we try to extract is this :  title="הנקה">הנקה
     *  (Given without escape sequence)
     */

    String extracted = str.Substring(splitStart, splitEnd - splitStart); // <=Extracting required portion 

    String[] splitted = extracted.Split('"'); // < = Now split with "

    Console.WriteLine(splitted[1]);  // <= Try to Out but yes will produce ???? But put a breakpoint here and check the values in split array

现在问题，在这里你可以看到我必须以一种不寻常的方式使用转义序列。您可以忽略它，因为您只是传递扫描字符串。

这实际上有效，但您无法使用提供的Console.WriteLine(splitted[1]);

对其进行可视化

但如果您设置断点并检查提取的拆分数组，则可以看到提取的文本。你可以通过以下截图确认它

Debugging for extracted text

当使用indexof和substring时，我如何解析正确的开始和结束索引？我如何编码希伯来字符？

2 个答案: