搜索文本文件中的字符串以及上一句和下一句

时间:2012-06-10 17:07:43

标签: c# string file phrase

如果我有搜索条件:She likes to watch tv

包含一些句子的输入文件text.txt,例如:

I don't know what to do. She doesn't know that it's not good for her health. She likes to watch tv but really don't know what to say. I don't blame her, but it's not her fault. This was just a test text. This is the end.

我想在文本文件中搜索字符串,并返回包含字符串的句子,以及它之前和之后的句子。

输出应如下所示:

She doesn't know that it's not good for her health. She likes to watch tv but really don't know what to say. I don't blame her, but it's not her fault.

因此,它输出匹配搜索词之前的句子,包含搜索词的句子和搜索词之后的句子。

4 个答案:

答案 0 :(得分:3)

这样的事情怎么样:

    string @in = @"I don't know what to do. She doesn't know that it's not good for her health. She likes to watch tv but really don't know what to say. I don't blame her, but it's not her fault. This was just a test text. This is the end.";
    string phrase = @"She likes to watch tv";


    int startIndex = @in.IndexOf(phrase);
    int endIndex = startIndex + phrase.Length;
    int tmpIndex;

    tmpIndex = @in.Substring(0, startIndex).LastIndexOf(". ");
    if (tmpIndex > -1)
    {
        startIndex = tmpIndex + 1;
        tmpIndex = @in.Substring(0, startIndex).LastIndexOf(". ");
        if (tmpIndex > -1)
        {
            startIndex = tmpIndex + 1;
            tmpIndex = @in.Substring(0, startIndex).LastIndexOf(". ");
            if (tmpIndex > -1)
            {
                startIndex = tmpIndex;
            }
        }
    }

    tmpIndex = @in.IndexOf(".", endIndex);
    if (tmpIndex > -1)
    {
        endIndex = tmpIndex + 1;
        tmpIndex = @in.IndexOf(".", endIndex);
        if (tmpIndex > -1)
        {
            endIndex = tmpIndex + 1;
        }
    }

    Console.WriteLine(@in.Substring(startIndex, endIndex - startIndex).Trim());

我假设您要查找的短语由'。'分隔。此代码的工作原理是查找短语的索引并查看前一个短语的匹配,并查看后面句子的短语。

答案 1 :(得分:3)

这里介绍一种方式:

string content = @"I don't know what to do. She doesn't know that it's not good for her health. She likes to watch tv but really don't know what to say. I don't blame her, but it's not her fault. This was just a test text. This is the end.";

string input = @"She likes to watch tv";
string curPhrase = string.Empty, prevPhrase = string.Empty, nextPhrase = string.Empty;

char[] delim = new char[] { '.' };
string[] phrases = content.Split(delim, StringSplitOptions.RemoveEmptyEntries);

for(int i=0; i<phrases.Length; i++){
    if(phrases[i].IndexOf(input) != -1){
        curPhrase = phrases[i];
        prevPhrase = phrases[i - 1];
        if (phrases[i + 1] != null)
            nextPhrase = phrases[i + 1];

        break;
    }
}

它首先在句号.处拆分整个文本,将它们存储在一个数组中,然后在搜索数组中的输入字符串后取出当前,上一个和下一个短语。

答案 2 :(得分:2)

使用String.IndexOf()docs),它将返回文件中第一次出现的字符串。使用此值,您可以删除包含的短语或句子:

int index = paragraph.IndexOf("She likes to watch tv")

然后你会使用index来设置边界和分割(可能在regular expression中使用大写字母和句号),以拉出任何一边的句子。

答案 3 :(得分:2)

您可以使用Regex抓取文字:

string text = "I don't know what to do. She doesn't know that it's not good for her health. She likes to watch tv but really don't know what to say. I don't blame her, but it's not her fault. This was just a test text. This is the end.";

string target = "She likes to watch tv";

string result = Regex.Replace(text, "(?:.*?\\.\\s)?((?:[^.]*?)" + target + "[^.]*?\\.)(?:.*)", "$1");

//result = "She likes to watch tv but really don't know what to say."

参考:http://msdn.microsoft.com/en-us/library/system.text.regularexpressions.regex.replace(v=vs.90).aspx