我在创建从字符串中删除停用词的代码时遇到问题。这是我的代码:
String Review="The portfolio is fine except for the fact that the last movement of sonata #6 is missing. What should one expect?";
string[] arrStopword = new string[] {"a", "i", "it", "am", "at", "on", "in", "to", "too", "very","of", "from", "here", "even", "the", "but", "and", "is","my","them", "then", "this", "that", "than", "though", "so", "are"};
StringBuilder sbReview = new StringBuilder(Review);
foreach (string word in arrStopword){
sbReview.Replace(word, "");}
Label1.Text = sbReview.ToString();
运行Label1.Text = "The portfolo s fne except for fct tht lst movement st #6 s mssng. Wht should e expect? "
我希望它必须返回"portofolio fine except for fact last movement sonata #6 is missing. what should one expect?"
有人知道如何解决这个问题吗?
答案 0 :(得分:2)
您可以使用LINQ来解决此问题。您首先需要使用string
函数将Split
转换为由list
(空格)分隔的string
" "
,然后使用Except
获取结果将包含的单词,然后可以应用string.Join
var newString = string.Join(" ", Review.Split(' ').Except(arrStopword));
答案 1 :(得分:1)
问题是你要比较子字符串,而不是单词。您需要拆分原始文本,删除项目然后再次加入。
试试这个
<html>
<head>
<title>My page</title>
</head>
<body>
<div class="container">
@yield('content')
</div>
@yield('scripts')
</body>
</html>
我能看到的唯一一个问题就是它没有很好地处理点击,但你得到了一般的想法。
答案 2 :(得分:0)
您可以使用&#34; a&#34;,&#34; I&#34;等确保程序只删除这些单词,如果它们被用作单词(因此它们周围有空格)。只需用空格替换它们就可以保持格式化。
答案 3 :(得分:0)
或者您可以使用dotnet-stop-words package。
只需调用RemoveStopWords
方法
(yourString).RemoveStopWords("en");