如果我有搜索条件:She likes to watch tv
包含一些句子的输入文件text.txt
,例如:
I don't know what to do. She doesn't know that it's not good for her health. She likes to watch tv but really don't know what to say. I don't blame her, but it's not her fault. This was just a test text. This is the end.
我想在文本文件中搜索字符串,并返回包含字符串的句子,以及它之前和之后的句子。
输出应如下所示:
She doesn't know that it's not good for her health. She likes to watch tv but really don't know what to say. I don't blame her, but it's not her fault.
因此,它输出匹配搜索词之前的句子,包含搜索词的句子和搜索词之后的句子。
答案 0 :(得分:3)
这样的事情怎么样:
string @in = @"I don't know what to do. She doesn't know that it's not good for her health. She likes to watch tv but really don't know what to say. I don't blame her, but it's not her fault. This was just a test text. This is the end.";
string phrase = @"She likes to watch tv";
int startIndex = @in.IndexOf(phrase);
int endIndex = startIndex + phrase.Length;
int tmpIndex;
tmpIndex = @in.Substring(0, startIndex).LastIndexOf(". ");
if (tmpIndex > -1)
{
startIndex = tmpIndex + 1;
tmpIndex = @in.Substring(0, startIndex).LastIndexOf(". ");
if (tmpIndex > -1)
{
startIndex = tmpIndex + 1;
tmpIndex = @in.Substring(0, startIndex).LastIndexOf(". ");
if (tmpIndex > -1)
{
startIndex = tmpIndex;
}
}
}
tmpIndex = @in.IndexOf(".", endIndex);
if (tmpIndex > -1)
{
endIndex = tmpIndex + 1;
tmpIndex = @in.IndexOf(".", endIndex);
if (tmpIndex > -1)
{
endIndex = tmpIndex + 1;
}
}
Console.WriteLine(@in.Substring(startIndex, endIndex - startIndex).Trim());
我假设您要查找的短语由'。'分隔。此代码的工作原理是查找短语的索引并查看前一个短语的匹配,并查看后面句子的短语。
答案 1 :(得分:3)
这里介绍一种方式:
string content = @"I don't know what to do. She doesn't know that it's not good for her health. She likes to watch tv but really don't know what to say. I don't blame her, but it's not her fault. This was just a test text. This is the end.";
string input = @"She likes to watch tv";
string curPhrase = string.Empty, prevPhrase = string.Empty, nextPhrase = string.Empty;
char[] delim = new char[] { '.' };
string[] phrases = content.Split(delim, StringSplitOptions.RemoveEmptyEntries);
for(int i=0; i<phrases.Length; i++){
if(phrases[i].IndexOf(input) != -1){
curPhrase = phrases[i];
prevPhrase = phrases[i - 1];
if (phrases[i + 1] != null)
nextPhrase = phrases[i + 1];
break;
}
}
它首先在句号.
处拆分整个文本,将它们存储在一个数组中,然后在搜索数组中的输入字符串后取出当前,上一个和下一个短语。
答案 2 :(得分:2)
使用String.IndexOf()
(docs),它将返回文件中第一次出现的字符串。使用此值,您可以删除包含的短语或句子:
int index = paragraph.IndexOf("She likes to watch tv")
然后你会使用index
来设置边界和分割(可能在regular expression中使用大写字母和句号),以拉出任何一边的句子。
答案 3 :(得分:2)
您可以使用Regex
抓取文字:
string text = "I don't know what to do. She doesn't know that it's not good for her health. She likes to watch tv but really don't know what to say. I don't blame her, but it's not her fault. This was just a test text. This is the end.";
string target = "She likes to watch tv";
string result = Regex.Replace(text, "(?:.*?\\.\\s)?((?:[^.]*?)" + target + "[^.]*?\\.)(?:.*)", "$1");
//result = "She likes to watch tv but really don't know what to say."
参考:http://msdn.microsoft.com/en-us/library/system.text.regularexpressions.regex.replace(v=vs.90).aspx