为什么日期在.CSV文件中打开时显示不同

时间:2016-11-20 05:43:50

标签: c# regex

我有一百个来自文本文件的数据行,使用Regex MatchCollection捕获并将其输出为逗号分隔(csv)文件,以便在excel中进行后续检查。

我的正则表达式如下: -

Regex Line3 = new Regex(@"(?<one>[0-9]{2}-[0-9]{2}-[0-9]{2})\s{1,20}114B\s{1,15}(?<two>\d{1,11})\s{1,15}(?<three>\d{1,11})\s{1,15}(?<four>\d{1,11})\s{1,30}(?<five>\d{1,11})");//<one> catpures the date data.

MatchCollection matches = Line3.Matches(line1);
foreach (Match m in matches)
{
    Writer1.WriteLine("")//
    //Writer1.Write(line1.Substring(1, 27) + ","); //Do not consider this.
    Writer1.Write(m.Groups["one"].Value + ",");
    Writer1.Write(m.Groups["two"].Value + ",");
    Writer1.Write(m.Groups["three"].Value + ",");
    Writer1.Write(m.Groups["four"].Value + ",");
    Writer1.Write(m.Groups["five"].Value + ",");
 }

我的tex文件将始终包含与正则表达式匹配的统一数据,并且由于正则表达式设计师的天才,我的程序可以精美地捕获所需的信息。

但是当我在excel中打开csv文件(双击.csv)时,包含日期数据的列会不规则地显示在下面。

  12-04-2012,0,0,0,0, //appears right-aligned in excel.
  12-04-2012,0,0,0,0, //this is how it looks like in Editpad Lite.
  12-04-2012,0,0,0,0, // these dashes appears in excel as as 12/4/2012
  12-04-2012,0,0,0,0, //next five lines as well.
  12-04-2012,0,0,0,0,
  12-04-2012,0,0,0,0, //
  12-04-2012,5467,757488,846815,0,
 13-04-12,0,0,0,0, //appears left aligned in excel.
 13-04-12,0,0,0,0,
 20-04-12,0,0,500,0,
 21-04-12,1740,17905,17900,0,
 21-04-12,0,0,0,0,
 24-04-12,1466,31666,31420,0,

我的输入文件看起来像。

12-04-12                  114B           0             0              0                0
12-04-12                  114B           0             0              0                0
12-04-12                  114B           0             0              0                0
12-04-12                  114B           0             0              0                0
12-04-12                  114B           0             0              0                0
12-04-12                  114B           0             0              0                0
12-04-12                  114B        5467        757488         846815                0
13-04-12                  114B           0             0              0                0
13-04-12                  114B           0             0              0                0
20-04-12                  114B           0             0            500                0
21-04-12                  114B        1740         17905          17900                0
21-04-12                  114B           0             0              0                0
24-04-12                  114B        1466         31666          31420                0

当我用记事本检查.csv文件时,输出非常均匀。只有当我在excel中打开csv文件时才会出现问题。

你们中的任何人都可以帮助解决不一致的原因吗?。

2 个答案:

答案 0 :(得分:0)

如果以yyyy-MM-dd格式输出日期,则Excel应将其解析为日期。

您可以使用适当的CultureInfo将文本dd-MM-yy转换为DateTime,然后以yyyy-MM-dd格式写出日期就很容易了。

using System;
using System.Globalization;
using System.IO;
using System.Text.RegularExpressions;

namespace ConsoleApplication1
{
    class Program
    {
        static void Main(string[] args)
        {

            string inFile = @"C:\temp\sampledata.txt";
            string outFile = @"C:\temp\sampledata.csv";

            //<one> catpures the date data:
            Regex re = new Regex(@"(?<one>[0-9]{2}-[0-9]{2}-[0-9]{2})\s{1,20}114B\s{1,15}(?<two>\d{1,11})\s{1,15}(?<three>\d{1,11})\s{1,15}(?<four>\d{1,11})\s{1,30}(?<five>\d{1,11})");

            using (var sr = new StreamReader(inFile))
            {
                using (var sw = new StreamWriter(outFile))
                {
                    string line1;
                    DateTime dt;
                    var ci = new CultureInfo("ur-PK");
                    while (!sr.EndOfStream)
                    {
                        line1 = sr.ReadLine();
                        MatchCollection matches = re.Matches(line1);
                        foreach (Match m in matches)
                        {
                            dt = DateTime.Parse(m.Groups["one"].Value, ci);
                            sw.Write(dt.ToString("yyyy-MM-dd") + ",");
                            sw.Write(m.Groups["two"].Value + ",");
                            sw.Write(m.Groups["three"].Value + ",");
                            sw.Write(m.Groups["four"].Value + ",");
                            sw.Write(m.Groups["five"].Value + Environment.NewLine);
                        }
                    }
                }
            }
        }
    }
}

我用过&#34; ur-PK&#34;正如您所提到的那样,您的输入文件日期格式是在印度和巴基斯坦使用的,但有几个-IN代码可供选择,我不知道其中一个是否可能不适合您的使用。

使用您显示的样本数据输出:

2012-04-12,0,0,0,0
2012-04-12,0,0,0,0
2012-04-12,0,0,0,0
2012-04-12,0,0,0,0
2012-04-12,0,0,0,0
2012-04-12,0,0,0,0
2012-04-12,5467,757488,846815,0
2012-04-13,0,0,0,0
2012-04-13,0,0,0,0
2012-04-20,0,0,500,0
2012-04-21,1740,17905,17900,0
2012-04-21,0,0,0,0
2012-04-24,1466,31666,31420,0

当您在Excel中打开csv文件时,它应该识别&#34; 2012-04-12&#34;等等,不管Windows日期格式设置如何。我没有Excel来测试。

应该然后以Windows短日期格式设置显示日期。

答案 1 :(得分:0)

为了解决问题,我认为这个观点特别是安德鲁莫顿的观点。

这是我用来解决并在excel中正确使用的方法。

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.IO; //for StreamReader and StreamWriter
using System.Text.RegularExpressions;
using System.Windows.Forms; 
using System.Globalization; //from two digit date to four digit date conversion.

namespace Experiment2
{
    class DemandRefundOnly
    {
      public line1 {get; set;}
      public Line3 {get; set;}
      DateTime dateTime;
      int FourDigitYear; 
      int Month; 
      int Day;
      DateTime dateTime;
      Regex Line3 = new Regex(@"(?<one>[0-9]{2}-[0-9]{2}-[0-9]{2})\s{1,20}114B\s{1,15}(?<two>\d{1,11})\s{1,15}(?<three>\d{1,11})\s{1,15}(?<four>\d{1,11})\s{1,30}(?<five>\d{1,11})");//Regex to capture data. //<one> catpures the date data.

      //Only the relevant date part is going to be shown in the output given.

      using (StreamReader Reader1 = new StreamReader(@"C:\Users\UK\data.txt"))
      { //StreamREader to read the input text file.
         using(StreamWriter Writer1 = new StreamWriter(@"C:\Users\Sample.csv"))
         { //StreamWriter to wrie to the output file.
            while((line1 = Reader1.ReadLine())!= null)
            { //to loop through the input file.
               MatchCollection matches = Line3.Matches(line1);
               foreach (Match m in matches)
               { //for...each to loop through and print the matches.
                  //Writer1.Write(m.Groups["one"].Value + ","); //this line modified with the following.


                 Day = Convert.ToInt32(m.Groups["one"].Value.Substring(0, 2));
                    //the above captures the first two digits date string contained in m.Groups["one"].Value and stores the first two characters as int to Day.
                 Month = Convert.ToInt32(m.Groups["one"].Value.Substring(3, 2));
                 FourDigitYear = Convert.ToInt32(m.Groups["one"].Value.Substring(6, 2));
                 FourDigitYear = CultureInfo.CurrentCulture.Calendar.ToFourDigitYear(FourDigitYear);
                 dateTime = new DateTime(FourDigitYear, Month, Day);
                 Writer1.WriteLine(dateTime);
               }
            }
         }
      }
   }
}

使用StreamWriter写入的outputput文件看起来像。

4/5/2012 12:00:00 AM
4/5/2012 12:00:00 AM
4/9/2012 12:00:00 AM
4/9/2012 12:00:00 AM
4/9/2012 12:00:00 AM
4/9/2012 12:00:00 AM
4/9/2012 12:00:00 AM
4/12/2012 12:00:00 AM

我尝试将新创建的输出文件导入Excel(其中在一列中)日期以统一的方式正确显示。这是我可以接受的,因为我们在印度使用斜线或破折号甚至点来分隔dd mm yy。这对我来说没问题。我也承认我从Stackoverflow本身获得了FourDigit Converter代码行。特别感谢@AdrianHHH和@Andrew Morten,他们为我积极地花了一些宝贵的时间。