Question

如果我有搜索条件：She likes to watch tv

包含一些句子的输入文件text.txt，例如：

I don't know what to do. She doesn't know that it's not good for her health. She likes to watch tv but really don't know what to say. I don't blame her, but it's not her fault. This was just a test text. This is the end.

我想在文本文件中搜索字符串，并返回包含字符串的句子，以及它之前和之后的句子。

输出应如下所示：

She doesn't know that it's not good for her health. She likes to watch tv but really don't know what to say. I don't blame her, but it's not her fault.

因此，它输出匹配搜索词之前的句子，包含搜索词的句子和搜索词之后的句子。

Answer 1

这样的事情怎么样：

    string @in = @"I don't know what to do. She doesn't know that it's not good for her health. She likes to watch tv but really don't know what to say. I don't blame her, but it's not her fault. This was just a test text. This is the end.";
    string phrase = @"She likes to watch tv";


    int startIndex = @in.IndexOf(phrase);
    int endIndex = startIndex + phrase.Length;
    int tmpIndex;

    tmpIndex = @in.Substring(0, startIndex).LastIndexOf(". ");
    if (tmpIndex > -1)
    {
        startIndex = tmpIndex + 1;
        tmpIndex = @in.Substring(0, startIndex).LastIndexOf(". ");
        if (tmpIndex > -1)
        {
            startIndex = tmpIndex + 1;
            tmpIndex = @in.Substring(0, startIndex).LastIndexOf(". ");
            if (tmpIndex > -1)
            {
                startIndex = tmpIndex;
            }
        }
    }

    tmpIndex = @in.IndexOf(".", endIndex);
    if (tmpIndex > -1)
    {
        endIndex = tmpIndex + 1;
        tmpIndex = @in.IndexOf(".", endIndex);
        if (tmpIndex > -1)
        {
            endIndex = tmpIndex + 1;
        }
    }

    Console.WriteLine(@in.Substring(startIndex, endIndex - startIndex).Trim());

我假设您要查找的短语由'。'分隔。此代码的工作原理是查找短语的索引并查看前一个短语的匹配，并查看后面句子的短语。

Answer 2

这里介绍一种方式：

string content = @"I don't know what to do. She doesn't know that it's not good for her health. She likes to watch tv but really don't know what to say. I don't blame her, but it's not her fault. This was just a test text. This is the end.";

string input = @"She likes to watch tv";
string curPhrase = string.Empty, prevPhrase = string.Empty, nextPhrase = string.Empty;

char[] delim = new char[] { '.' };
string[] phrases = content.Split(delim, StringSplitOptions.RemoveEmptyEntries);

for(int i=0; i<phrases.Length; i++){
    if(phrases[i].IndexOf(input) != -1){
        curPhrase = phrases[i];
        prevPhrase = phrases[i - 1];
        if (phrases[i + 1] != null)
            nextPhrase = phrases[i + 1];

        break;
    }
}

它首先在句号.处拆分整个文本，将它们存储在一个数组中，然后在搜索数组中的输入字符串后取出当前，上一个和下一个短语。

Answer 3

使用String.IndexOf()（docs），它将返回文件中第一次出现的字符串。使用此值，您可以删除包含的短语或句子：

int index = paragraph.IndexOf("She likes to watch tv")

然后你会使用index来设置边界和分割（可能在regular expression中使用大写字母和句号），以拉出任何一边的句子。

Answer 4

您可以使用Regex抓取文字：

string text = "I don't know what to do. She doesn't know that it's not good for her health. She likes to watch tv but really don't know what to say. I don't blame her, but it's not her fault. This was just a test text. This is the end.";

string target = "She likes to watch tv";

string result = Regex.Replace(text, "(?:.*?\\.\\s)?((?:[^.]*?)" + target + "[^.]*?\\.)(?:.*)", "$1");

//result = "She likes to watch tv but really don't know what to say."

参考：http://msdn.microsoft.com/en-us/library/system.text.regularexpressions.regex.replace(v=vs.90).aspx

搜索文本文件中的字符串以及上一句和下一句

4 个答案: