从asp.net c中的String中删除停用词

时间:2015-06-25 09:26:56

标签: c# asp.net stop-words

我在创建从字符串中删除停用词的代码时遇到问题。这是我的代码:

String Review="The portfolio is fine except for the fact that the last movement of sonata #6 is missing. What should one expect?";

string[] arrStopword = new string[] {"a", "i", "it", "am", "at", "on", "in", "to", "too", "very","of", "from", "here", "even", "the", "but", "and", "is","my","them", "then", "this", "that", "than", "though", "so", "are"};
StringBuilder sbReview = new StringBuilder(Review);
foreach (string word in arrStopword){
sbReview.Replace(word, "");}
Label1.Text = sbReview.ToString();

运行Label1.Text = "The portfolo s fne except for fct tht lst movement st #6 s mssng. Wht should e expect? "

我希望它必须返回"portofolio fine except for fact last movement sonata #6 is missing. what should one expect?"

有人知道如何解决这个问题吗?

4 个答案:

答案 0 :(得分:2)

您可以使用LINQ来解决此问题。您首先需要使用string函数将Split转换为由list(空格)分隔的string " ",然后使用Except获取结果将包含的单词,然后可以应用string.Join

var newString = string.Join(" ", Review.Split(' ').Except(arrStopword));

答案 1 :(得分:1)

问题是你要比较子字符串,而不是单词。您需要拆分原始文本,删除项目然后再次加入。

试试这个

<html>
    <head>
        <title>My page</title>
    </head>
    <body>
        <div class="container">
            @yield('content')
        </div>
        @yield('scripts')
    </body>
</html>

我能看到的唯一一个问题就是它没有很好地处理点击,但你得到了一般的想法。

答案 2 :(得分:0)

您可以使用&#34; a&#34;,&#34; I&#34;等确保程序只删除这些单词,如果它们被用作单词(因此它们周围有空格)。只需用空格替换它们就可以保持格式化。

答案 3 :(得分:0)

或者您可以使用dotnet-stop-words package。 只需调用RemoveStopWords方法

(yourString).RemoveStopWords("en");