用关键字提取句子

时间:2014-07-09 18:13:59

标签: c#

所以我有这个问题。

编写一个程序,从文本中提取包含特定单词的所有句子。 我们接受句子由字符"彼此分开。"。并且这些单词之间由一个非字母的字符分开。

示例文字:

We are living in a yellow submarine. We don't have anything else. Inside the submarine is very tight. So we are drinking all the day. We will move out of it in 5 days.

示例结果:

We are living in a yellow submarine.

We will move out of it in 5 days. 

到目前为止我的代码。

public static string Extract(string str, string keyword)
    {

        string[] arr = str.Split('.');
        string answer = string.Empty;

        foreach(string sentence in arr)
        {
            var iter = sentence.GetEnumerator();
            while(iter.MoveNext())
            {
                if(iter.Current.ToString() == keyword)
                    answer += sentence;
            }
        }

        return answer;
    }

嗯它不起作用。我用这段代码称呼它:

string example = "We are living in a yellow submarine. We don't have anything else. Inside the submarine is very tight. So we are drinking all the day. We will move out of it in 5 days.";

string keyword = "in";
string answer = Extract(example, keyword);
Console.WriteLine(answer);

不输出任何内容。它可能是迭代器部分,因为我不熟悉迭代器。

无论如何,该问题的提示说我们应该使用splitIndexOf方法。

6 个答案:

答案 0 :(得分:3)

sentence.GetEnumerator()正在返回CharEnumerator,因此您要检查每个句子中的每个字符。单个字符永远不会等于字符串"在"中,这就是它不起作用的原因。您需要查看每个句子中的每个单词,并与您要查找的单词进行比较。

答案 1 :(得分:1)

尝试:

public static string Extract(string str, string keyword)
{
    string[] arr = str.Split('.');
    string answer = string.Empty;

    foreach(string sentence in arr)
    {
        //Add any other required punctuation characters for splitting words in the sentence
        string[] words = sentence.Split(new char[] { ' ', ',' });
        if(words.Contains(keyword)
        {
            answer += sentence;
        }
    }

    return answer;
}

答案 2 :(得分:1)

您的代码使用迭代器逐个字符地遍历每个句子。除非关键字是单字符字(例如“I”或“a”),否则将无法匹配。

解决此问题的一种方法是使用LINQ检查句子是否包含关键字,如下所示:

foreach(string sentence in arr)
{
    if(sentence.Split(' ').Any(w => w == keyword))
            answer += sentence+". ";
}

Demo on ideone.

另一种方法是使用正则表达式来检查仅在字边界上的匹配。请注意,您不能使用普通Contains方法,因为这样做会导致“误报”(即查找关键字嵌入较长单词内的句子)。

需要注意的另一件事是使用+=进行连接。这种方法效率很低,因为会创建许多临时丢弃对象。实现相同结果的更好方法是使用StringBuilder

答案 3 :(得分:1)

string input = "We are living in a yellow submarine. We don't have anything else. Inside the submarine is very tight. So we are drinking all the day. We will move out of it in 5 days.";
var lookup = input.Split('.')
                .Select(s => s.Split().Select(w => new { w, s }))
                .SelectMany(x => x)
                .ToLookup(x => x.w, x => x.s);

foreach(var sentence  in lookup["in"])
{
    Console.WriteLine(sentence);
}

答案 4 :(得分:0)

您可以使用sentence.Contains(keyword)检查字符串是否包含您要查找的字词。

public static string Extract(string str, string keyword)
    {
        string[] arr = str.Split('.');
        string answer = string.Empty;

        foreach(string sentence in arr)
            if(sentence.Contains(keyword))
                answer+=sentence;

        return answer;
    }

答案 5 :(得分:0)

您可以拆分句点以获取句子集合,然后使用包含关键字的正则表达式对其进行过滤。

var results = example.Split('.')
    .Where(s => Regex.IsMatch(s, String.Format(@"\b{0}\b", keyword)));