读取复杂的文本文件以输入数据库

时间:2015-07-13 15:06:26

标签: c#

我正在开发一个程序,它将读入一个文本文件,然后将文本文件的区域插入到数据库的不同列中。文本文件通常设置如下:

"Intro information"

"more Intro information"

srvrmgr> "information about system"

srbrmgr> list parameters for component *ADMBatchProc*

"Headers"
*Name of record*  *alias of record*  *value of record*

列创建一个表,其中包含此组件的所有设置信息。列出了所有设置之一,文件移动到另一个组件并在新表中返回该组件的所有信息。我需要在没有标题或其他信息的情况下读取组件和表中的信息。然后,我需要能够将该数据传输到数据库中。列在文件中的每个表上都是固定宽度。

欢迎任何有关如何处理此问题的建议。我从来没有读过这个复杂的文件,所以我真的不知道如何在尝试为数据库准备其他信息时忽略大量信息。我试图收集的组件值始终跟在以“srvrmgr”开头的行上的单词组件。

'*'表示将放入数据库的区域。

Siebel Enterprise Applications Siebel Server Manager, Version 8.1.1.11 [23030] LANG_INDEPENDENT 
Copyright (c) 1994-2012, Oracle. All rights reserved.

The Programs (which include both the software and documentation) contain
proprietary information; they are provided under a license agreement     containing
restrictions on use and disclosure and are also protected by copyright,    patent,
and other intellectual and industrial property laws. Reverse engineering,
disassembly, or decompilation of the Programs, except to the extent required to
obtain interoperability with other independently created software or as    specified
by law, is prohibited.

Oracle, JD Edwards, PeopleSoft, and Siebel are registered trademarks of
Oracle Corporation and/or its affiliates. Other names may be trademarks
of their respective owners.

If you have received this software in error, please notify Oracle  Corporation
immediately at 1.800.ORACLE1.

Type "help" for list of commands, "help <topic>" for detailed help

Connected to 1 server(s) out of a total of 1 server(s) in the enterprise

srvrmgr> configure list parameters show PA_NAME,PA_ALIAS,PA_VALUE

srvrmgr> 

srvrmgr> list parameters for component ADMBatchProc

PA_NAME                                                                     PA_ALIAS                               PA_VALUE                                                                                                                                                                                                                                                          
----------------------------------------------------------------------  -------------------------------------  --------------------------------------------------------------------------------------------------------------------  
ADM Data Type Name                                                         ADMDataType                                                                                                                                                                                                                                                                                              
ADM EAI Method Name                                                         ADMEAIMethod                           Upsert                                                                                                                                                                                                                                                            
ADM Deployment Filter                                                     ADMFilter       

213 rows returned.

srvrmgr> list parameters for component ADMObjMgr_enu

PA_NAME                                                                 PA_ALIAS                               PA_VALUE                                                                                                                                                                                                                                                          
----------------------------------------------------------------------  -------------------------------------  --------------------------------------------------------------------------------------------------------------------  
AccessibleEnhanced                                                      AccessibleEnhanced                     False                                       

这是文本文件的开头。它是在一个名为Siebel的系统中生成的,用于显示此环境的所有设置。我需要提取组件名称(实际文件中有多个,但这里显示的是'ADMBatchProc'和'ADMObjMgr_enu'),然后是由Siebel创建的下表中显示的数据。其余信息与我需要的任务无关。

2 个答案:

答案 0 :(得分:1)

在这种情况下,我建议使用测试驱动开发技术。我猜你输入格式的可能变化几乎是无限的。

试试这个:

1)创建一个接口,表示您希望应用程序执行的数据操作或解析逻辑。例如:

public interface IParserBehaviors {
    void StartNextComponent();
    void SetTableName(string tableName);
    void DefineColumns(IEnumerable<string> columnNames);
    void LoadNewDataRow(IEnumerable<object> rowValues);
    DataTable ProduceTableForCurrentComponent();
    // etc.
}

2)收集尽可能多的具有明确行为的离散输入的小例子。

3)将行为处理程序注入解析器。例如:

public class Parser {
    private const string COMPONENT_MARKER = "srvrmgr";
    private readonly IParserBehaviors _behaviors;
    public Parser(IParserBehaviors behaviors) {
        _behaviors = behaviors;
    }
    public void ReadFile(string filename) {
        // bla bla
        foreach (string line in linesOfFile) {
            // maintain some state
            if (line.StartsWith(COMPONENT_MARKER)) {
                DataTable table = _behaviors.ProduceTableForCurrentComponent();
                // save table to the database
                _behaviors.StartNextComponent();
            }
            else if (/* condition */) {
                // parse some text
                _behaviors.LoadNewDataRow(values);
            }
        }
    }
}

4)使用您首选的模拟框架,围绕预期行为创建测试。例如:

public void FileWithTwoComponents_StartsTwoNewComponents() {
    string filename = "twocomponents.log";
    Mock<IParserBehaviors> mockBehaviors = new Mock<IParserBehaviors>();
    Parser parser = new Parser(mockBehaviors.Object);

    parser.ReadFile(filename);

    mockBehaviors.Verify(mock => mock.StartNextComponent(), Times.Exactly(2));
}

这样,您就可以在受控测试下进行集成。当(某些人)遇到问题时,您可以提炼出未涵盖的案例,并在从正在使用的日志中提取案例后添加围绕该行为的测试。以这种方式分离关注点也允许您的解析逻辑独立于数据操作逻辑。解析特定行为的需求似乎是您的应用程序的核心,因此它似乎非常适合创建一些特定于域的接口。

答案 1 :(得分:0)

您希望使用StreamReader读取文本文件:

using (FileStream fileStream = File.OpenRead(path))
{
    byte[] data = new byte[fileStream.Length];
    for (int index = 0; index < fileStream.Length; index++)
    {
        data[index] = (byte)fileStream.ReadByte();
    }
    Console.WriteLine(Encoding.UTF8.GetString(data)); // Displays: your file - now you can decide how to manipulate it.
}

也许您可以使用Regex来捕捉您要插入的日期:

您可以像这样插入数据库:

using (TransactionScope transactionScope = new TransactionScope())
{
    using (SqlConnection connection = new SqlConnection(connectionString))
    {
        connection.Open();
        SqlCommand command1 = new SqlCommand(
        “INSERT INTO People ([FirstName], [LastName], [MiddleInitial])
        VALUES(‘John’, ‘Doe’, null)”,
        connection);
        SqlCommand command2 = new SqlCommand(
        “INSERT INTO People ([FirstName], [LastName], [MiddleInitial])
        VALUES(‘Jane’, ‘Doe’, null)”,
        connection);
        command1.ExecuteNonQuery();
        command2.ExecuteNonQuery();
    }
    transactionScope.Complete();
}

改编自Wouter de Kort的C#70-483的例子。