net web project我试图插入带有3段的文本的所有单词。如果句子是这样的话
"此文本是测试文本。"
然后最后一个词用点数据库。 (文字)
strNew = strNew.Trim(new Char[] { ' ', ',', '.', '?' });
我已经尝试过这段代码,但没有帮助。这是我的全部代码。
{int id = 0;
ListBox1.Items.Clear();
string strNew = Request.Form["TextBox1"];
int n = strNew.Split(' ').Length;
ListBox3.Items.Add(String.Format("Number of Words: {0}", n));
int m = Regex.Matches(strNew, "[^\r\n]+((\r|\n|\r\n)[^\r\n]+)*").Count;//Counts Number of Paragraphes
ListBox3.Items.Add(String.Format("Number of Paragraphes: {0}", m));
strNew = strNew.ToLower();// all lower case
strNew = strNew.Trim(new Char[] { ' ', ',', '.', '?' });
var results = strNew.Split(' ').Where(x => x.Length > 1)
.GroupBy(x => x)
.Select(x => new { Count = x.Count(), Word = x.Key });//splitting sentences in to words
using (con){
con.Open();
foreach (var item in results) {//here trying to insert word its id and some other informations but for now they can stay null(yes,null allowed for them)
id++;
string w = item.Word.ToString();
SqlCommand cmd= con.CreateCommand();
cmd.Parameters.AddWithValue("@id", id);
cmd.Parameters.AddWithValue("@word", item.Word);
cmd.CommandText= "INSERT INTO word(id, word, sid, frequency, weight, f) VALUES (@id, @word, 0, 0, 0, 0) ";
cmd.ExecuteNonQuery();
}
con.Close();
} }
答案 0 :(得分:0)
问题不在一条线上,是多段拆分和修剪。
如果你应用修剪,它只能在当前行的开头和结尾处工作,而不是段落,因为术语段落在计算机语言字符串中是未知的。
在将其拆分为段落后,您应修剪每一行。
示例:“此文字为测试文字。\ r \ n这是另一项测试。”
当前操作只会裁减上一个.
因为test.\r\nThis
将被识别为1个字。
<强>代码:强>
char[] trimCharacters = { ' ', ',', '.', '?' };
var results = strNew.Split(new string[] { " ", "\r\n", "\n" }, StringSplitOptions.RemoveEmptyEntries)
.GroupBy(x => x)
.Select(x => new { Count = x.Count(), Word = x.Key.Trim(trimCharacters) });
答案 1 :(得分:0)
事实上,没有必要修剪你的琴弦。
var matches = System.Text.RegularExpressions.Regex.Matches(strNew, @"(\b\w+\b)")
.Cast<System.Text.RegularExpressions.Match>();
var result = matches.GroupBy(m => m.Value)
.Select(gr => new { Word = gr.Key, Count = gr.Count() });
foreach (var r in result)
{
//do whatever you want with r.Word and r.Count
}
答案 2 :(得分:0)
为什么在拆分后不对每个单词使用strNew.TrimEnd('。')或TrimStart?