在C#中读取文件并在没有#的情况下提取单词的程序

时间:2013-03-03 18:49:33

标签: c# c#-4.0

我读了一个文件,文件格式是这个 输入文件格式

        id          PosScore  NegScore       Word                             SynSet   

        00002098    0         0.75           unable#1                         (usually followed by `to') not having the necessary means or skill or know-how; "unable to get to town without a car"; "unable to obtain funds"
        00002312    0.23      0.43           dorsal#2 abaxial#1               facing away from the axis of an organ or organism; "the abaxial surface of a leaf is the underside or side facing away from the stem"
        00002527    0.14      0.26           ventral#2 adaxial#1              nearest to or facing toward the axis of an organ or organism; "the upper side of a leaf is known as the adaxial surface"
        00002730    0.45      0.32           acroscopic#1                     facing or on the side toward the apex
        00002843    0.91      0.87           basiscopic#1                     facing or on the side toward the base
        00002956    0.43      0.73           abducting#1 abducent#1           especially of muscles; drawing away from the midline of the body or from an adjacent part
        00003131    0.15      0.67           adductive#1 adducting#1 adducent#1  especially of muscles; bringing together or drawing toward the midline of the body or toward an adjacent part    
in this file     

在此文件中,Synset列应该被删除,第二件事如果Word列有多个单词,那么id,PosScore,NegScore将根据单词重复行重复,但id,posScore,NegScore将是相同。 我想要上面文件的以下输出
输出

 id         PosScore      NegScore              Word     
00002098    0             0.75              unable#1    
00002312    0.23          0.43               dorsal#2    
00002312    0.23          0.43               abaxial#1       
00002527    0.14          0.26               ventral#2    
00002527    0.14          0.26               adaxial#1     
00002730    0.45          0.32               acroscopic#1    
00002843    0.91          0.87               basiscopic#1    
00002956    0.43          0.73               abducting#1    
00002956    0.43          0.73               abducent#1    
00003131    0.15          0.67               adductive#1    
00003131    0.15          0.67               adducting#1    
00003131    0.15          0.67               adducent#1    

我写下面的代码,但它给出了意想不到的结果。

 TextWriter tw = new StreamWriter("D:\\output.txt");    
 private void button1_Click(object sender, EventArgs e)
        {

                StreamReader reader = new StreamReader(@"C:\Users\Zia Ur Rehman\Desktop\records.txt");
                string line;
                String lines = "";
                while ((line = reader.ReadLine()) != null)
                {

                    String[] str = line.Split('\t');

                    String[] words = str[4].Split(' ');
                    for (int k = 0; k < words.Length; k++)
                    {
                        for (int i = 0; i < str.Length; i++)
                        {
                            if (i + 1 != str.Length)
                            {
                                lines = lines + str[i] + ",";
                            }
                            else
                            {
                                lines = lines + words[k] + "\r\n";

                            }
                        }
                    }
                }
            tw.Write(lines);
            tw.Close();
            reader.Close();    
        } 

此代码提供以下错误的结果

00002098,0,0.75,unable#1,unable#1
00002312,0,0,dorsal#2 abaxial#1,dorsal#2
00002312,0,0,dorsal#2 abaxial#1,abaxial#1
00002527,0,0,ventral#2 adaxial#1,ventral#2
00002527,0,0,ventral#2 adaxial#1,adaxial#1
00002730,0,0,acroscopic#1,acroscopic#1
00002843,0,0,basiscopic#1,basiscopic#1
00002956,0,0,abducting#1 abducent#1,abducting#1
00002956,0,0,abducting#1 abducent#1,abducent#1
00003131,0,0,adductive#1 adducting#1 adducent#1,adductive#1
00003131,0,0,adductive#1 adducting#1 adducent#1,adducting#1
00003131,0,0,adductive#1 adducting#1 adducent#1,adducent#1

2 个答案:

答案 0 :(得分:2)

所以,现在正在运作。经过长时间的努力。
注意:如果您未在输入文件中使用正确的标签。结果将是不正确的。不要忽视正确的标签。

  TextWriter tw = new StreamWriter("D:\\output.txt");    
  private void button1_Click(object sender, EventArgs e)
  {
        StreamReader reader = new StreamReader(@"C:\Users\Mohsin\Desktop\records.txt");
        string line;
        String lines = "";
        while ((line = reader.ReadLine()) != null)
        {

            String[] str = line.Split('\t');

            String[] words = str[3].Split(' ');
            for (int k = 0; k < words.Length; k++)
            {
                for (int i = 0; i < 4; i++)
                {
                    if (i + 1 != 4)
                    {
                        lines = lines + str[i] + "\t";
                    }
                    else
                    {
                        lines = lines + words[k] + "\r\n";

                    }
                }
            }
        }
        tw.Write(lines);
        tw.Close();
        reader.Close();
  }

答案 1 :(得分:1)

我简化了您的代码并使其正常工作。 它仍然缺乏验证,使用StringBuilder可以更高效,特别是通过将每行写入文件而不是将其附加到String。它也缺少exception handling

using (TextWriter tw = File.CreateText(@"c:\temp\result.txt"))
using (StreamReader reader = new StreamReader(@"stackov1.txt"))
{
    string line;
    String lines = "";
    while ((line = reader.ReadLine()) != null)
    {

        String[] str = line.Split('\t');

        String[] words = str[3].Split(' ');
        for (int k = 0; k < words.Length; k++)
        {
            lines = lines + str[0] + "\t" + str[1] + "\t" + str[2] + "\t" + words[k] + "\r\n";
        }
    }
    tw.Write(lines);
}