Question

需要找到包含条件的文本片段：需要从文本中找到最长的文本片段，其中单词以与前一个单词的最后一个字母相同的字母开头（例如：1。my 2.years）。我需要打印出该片段以及他在文本中所在行的编号。我的代码：

public static string Longestfragment(string[] lines,char[] isolations ,ref int index)
{
    string longestSentense = "";
    int longestCount = 0;
    int start = 0;
    int end = 0;

    foreach (string sentense in lines)
    {
        string[] words = sentense.Split(isolations); // split the words
        int count = 0;
        int line = 0;
        line++;
        for (int i = 0; i < words.Length - 1; i++)
        {
            // checking if the first word ends equals to the second word start
            if (words[i].Equals("") || words[i + 1].Equals("")) continue; // checking if one of the words not empty.
            if (words[i][words[i].Length - 1].Equals(words[i + 1][0]))
            {
                if (count == 0) //to find the start of fragment
                {
                    start = sentense.IndexOf(words[i][0]);
                    end = sentense.IndexOf(words[i + 1][words[i + 1].Length - 1]);
                }// to find the end of the fragment if the fragment if longer than 2 words.
                if (count >= 1)
                {
                    end = sentense.IndexOf(words[i + 1][words[i + 1].Length - 1]);
                }
                count++;

            }

        }
        // if there is the longest fragment we save it.
        if (count > longestCount)
        {
            longestCount = count;
            longestSentense = sentense.Substring(start,end-1);
            index = line; // to find the line index
        }
    }
    return longestSentense; //returning the value of longestfragment
}

如果我的文本文件是：

我的名字是萨姆。我的岁月如此美好。

我得到索引1（我认为它应该是0）和最长的句子（是Sam。我的年龄那么好。）这是正确的。但如果我的文本文件包含2行或更多行，如：

等于序列输入绳索8。

我的名字是萨姆。我的岁月如此美好。

我的程序崩溃或打印出错误的句子。请帮忙。

Answer 1

我不知道这是否是你的选择，但是搜索文本的模式可以通过正则表达式来完成，而不是通过循环来完成。

我很快就为你找到了一个可以在文本中找到所有模式的文章：example regex screenshot

\w+(\w)\s\g{-1}\w+

您可以将所有比赛导出到例如列表然后搜索此列表中最长的列表。

虽然正则表达式可能非常棘手且有时无法预测，但请注意。我很可能不会失败，也不会考虑像aword，danotherword这样的东西，因为它没有说明标点符号等。但这应该给出一个良好方向的暗示。

修改 .NET directly supports regexes。它位于命名空间中：

System.Text.RegularExpressions

Answer 2

你的主要问题是这一行：

end = sentense.IndexOf(words[i + 1][words[i + 1].Length - 1]);

我相信它会搜索符合您条件的第二个单词的最后一个字母的索引。

在这句话中：

等于序列输入绳索8。

当i == 4到达eight和the时。如果您现在搜索the =＆gt;的最后一个字母的索引e IndexOf()返回给您：

数组中第一次出现值的索引

因此，您获得0，因为您的句子以e开头，并且当您尝试访问位置-1处的元素时，此行中的界限超出范围：

longestSentense = sentense.Substring(start, end - 1);

<强>解决方案：

我建议使用整个单词和方法LastIndexOf()来计算结束索引。对于单词33，它将返回the，因为它从此时开始。你只需要添加单词的长度即可结束：

end = sentense.LastIndexOf（words [i + 1]）+ words [i + 1] .Length;
从句子中访问Substring()时。第二个参数是length而不是结尾。

public string Substring（int startIndex，int length）

所以你需要减去起始索引：

longestSentense = sentense.Substring(start, end-start);

起始索引也像结束一样构成了同样的问题。第一次出现！我还建议搜索单词，而不是字母。以这句话为例：

我的名字是乔。我的岁月如此美好。

您的片段将在Joe.后my开始，但IndexOf(String s)将返回第一个my。您应该计算一个偏移量，当您浏览句子中的每个单词时，该偏移量一直在计算：

if (Char.ToLower(words[i].Last()) == char.ToLower(words[i + 1].First()))
{
    offset += words[i].Length;

    if (count == 0) //to find the start of fragment
    {
        start = sentense.IndexOf(words[i], offset);
        end = sentense.LastIndexOf(words[i + 1]) + words[i + 1].Length;

此外，第二个if条件不会考虑大写和小写字母，因此s == S将返回false。你可以强制这两个字母都是小写的，以避免这种情况：

前两个if条件可以写得更具可读性：

// do only if neither `null` nor `empty` nor `space`
if (!String.IsNullOrWhiteSpace(words[i] || !String.IsNullOrWhiteSpace(words[i+1])
{   // access the last and first elements using methods with such names
    if (Char.ToLower(words[i].Last()) == char.ToLower(words[i + 1].First()))
    {

    }
}

在最后line条件之后的foreach循环的最后增加if。这将为您提供正确的行。
您应该将Substring调用放入try catch阻止，或检查end是否可能为负数以避免异常：

if（count＆gt; longestCount＆amp;＆amp; end＆gt; = 0） { longestCount = count; longestSentense = sentense.Substring（start，end-start）; index = line; //找到行索引 }

好的，那是很多清理工作。玩得开心，我希望它有所帮助。

C＃在文本中找不到正确的片段

2 个答案: