从c#中的mysql数据库中检索Word得分

时间:2013-03-18 17:56:10

标签: c# mysql c#-4.0 mysql-workbench

我想从数据库中检索一个单词的分数然后我将决定段落这是正段还是负段
数据库文件格式是这样的。其中一些关键词有正面和负面分数

Word                Pos_Score           Neg_Score

Able                .324                .834
Country             .987                .213
Love                .378                .734 
agree               .546                .123
industry            .289                .714
guests              .874                .471

段落将是这样的。

I agree with you.  It seems an intelligent tourist industry allows its guests to either immerse fully, in part, or not, depending upon the guest.  That is why the ugly American charges have always confused me.  

现在我将段落的每个单词与数据库文件进行比较,如果在数据库文件中找到单词,那么我将检索单词的Pos_Scoe和Neg_Score得分,这些得分将存储在变量中,当整个段落将在结束Pos_Score将单独添加,Neg_Score将单独添加。这将是结果。
我尝试的代码就是这个

    private void button1_Click(object sender, EventArgs e)
            {
                string MyConString = "server=localhost;" +
                   "database=sentiwornet;" + "password=zia;" +
                   "User Id=root;";
                MySqlConnection connection = new MySqlConnection(MyConString);
                MySqlCommand command = connection.CreateCommand();
                MySqlDataReader Reader;
                StreamReader reader = new StreamReader("D:\\input.txt");
                string line;
                while ((line = reader.ReadLine()) != null)
                {
                    string[] parts = line.Split(' ');

                    foreach (string part in parts)
                    {
                        command.CommandText = "SELECT Pos_Score FROM score WHERE Word = 'part'";
                        command.CommandText = "SELECT Neg_Score FROM score WHERE Word = 'part'";
                        //var 
                        connection.Open();
                        Reader = command.ExecuteReader();

                    }
                }

            }

2 个答案:

答案 0 :(得分:2)

首先,这个查询承诺非常低效。相反,如果你的段落足够小,我会通过将参数作为CSV列表传入,然后转换为SQL中的表来执行数据库中的所有连接。以下函数将执行此操作(由http://codebank.wordpress.com/2007/03/06/simple-sql-csv-to-table-2/提供):

警告:您需要使用string.Replace(new[] { '.', ',' ... etc })

之类的内容删除所有标点符号

此外,我的代码可能无法完全按照您的要求进行 - 甚至可能无法编译 - 但这是编程的乐趣。这为您提供了如何解决相当复杂的问题的一般想法。

编辑:我刚刚意识到你正在使用MySql。这段代码适用于MSSQL - 我从未使用过CLR中的MySql,所以我不知道所有类是否都是等价的。你可能需要回到以前做过的事情。

CSV列表

Create Function dbo.fn_CSVToTable (@CSVList Varchar(MAX))
Returns @Table Table (ColumnData Varchar(50))
As
Begin
If right(@CSVList, 1) <> ','
Select @CSVList = @CSVList + ','

Declare @Pos    Smallint,
@OldPos Smallint
Select  @Pos    = 1,
@OldPos = 1

While   @Pos < Len(@CSVList)
Begin
Select  @Pos = CharIndex(',', @CSVList, @OldPos)
Insert into @Table
Select  LTrim(RTrim(SubString(@CSVList, @OldPos, @Pos - @OldPos))) Col001
Select  @OldPos = @Pos + 1
End

Return
End

SQL过程

CREATE PROCEDURE dbo.spGetWordScores (@csv varchar(MAX))
AS
select POS_SCORE, NEG_SCORE, WORD from score
inner join dbo.fn_CSVToTable(@csv) input
    on input.ColumnData = score.WORD

新C#代码

var MyConString = "server=localhost;" +
               "database=sentiwornet;" + "password=zia;" +
               "User Id=root;";
var connection = new MySqlConnection(MyConString);

//Each line in the array will probably be one paragraph.
var fileLines = File.ReadAllLines("D:\\input.txt");
foreach (var line in fileLines)
{
        //Format your line into words by removing punctuation. I'm not going to bother
        //with that code because it is trivial.
        //var csv = line.Split(' ');

        var command = connection.CreateCommand();
                    command.CommandText = "exec spGetWordScores";
                    command.Parameters.AddWithValue("@csv", csv);
        var ds = command.ExecuteDataSet();

        //Now you have a DataSet with your word scores. do with them what you will.
}

有用的扩展方法

public static class Extensions
{
    public static DataSet ExecuteDataSet(this SqlCommand command)
    {
        using (SqlDataAdapter da = new SqlDataAdapter(command)) {
        DataSet ds = new DataSet();

        // Fill the DataSet using default values for DataTable names, etc
        da.Fill(ds);

        return ds;
        }
    }
}

答案 1 :(得分:0)

回到数据库将会破坏你的表现。最好编写一个存储过程,它接收输入字符串,拆分它并计算分数 - 这样所有处理都将在一台机器上进行,你将通过不传递部分结果来节省大量时间。