将结构化文本转换为业务实体的更好方法

时间:2011-11-22 02:06:03

标签: c# .net

我正在尝试找到一个更好的解决方案,将纯文本(但每个字段的预定义长度)转换为业务实体。 例如,输入文本可以是“Testuser new york 10018”,前11个字符表示用户名,接下来的12个字符表示城市,接下来的5个字符表示邮政编码。 输入文本可以长达1000个字符,表示实体中的多个属性

任何帮助表示赞赏。谢谢

我尝试了以下方法

  1. 定义的xml结构,可以反序列化为业务实体

  2. 使用xslt导航到每个节点并使用输入文本上的子字符串函数填充xml元素值。

  3. 填充xml后,将xml反序列化为实体。

  4. 但我认为上述方法可能无法通过多个xslts加载来将不同输入转换为相应的xmls

3 个答案:

答案 0 :(得分:2)

一个漂亮而优雅的方法可能是在System.Text.RegularExpressions命名空间中使用正则表达式,所以像这样:

static Regex inputParser = new Regex("(.{11})(.{12})(.{5})", RegexOptions.Compiled");

foreach(Match m in inputParser.Matches(yourInput)) {
    BusinessEntity e = new BusinessEntity();
    e.Username = m.Groups(1).Value.TrimEnd(); // Remove spaces from the end; I take it that's what they'll be padded with
    e.City = m.Groups(2).Value.TrimEnd();
    e.ZipCode = m.Groups(3).Value;
    myListOfBusinessEntities.Add(e);
}

答案 1 :(得分:0)

如果您遇到一种情况,您只需使用接收文本行并返回新实体的方法编写一个简单的类。

如果使用空白填充行,使用固定长度行,使用System.Text.Encoding类和GetString方法的二进制阅读器可以生成更快的解决方案。

答案 2 :(得分:0)

根据问题的改进,我推断你有不同的输入格式。这是IFormatter的一个实现,它可以帮助你完成大部分工作。请注意,这有几种不同的方式,hacky,并没有任何保证:

void Test()
{
    var serializer = new FixedWidthSerializer<MyClass>();
    var ms = new MemoryStream();
    serializer.Serialize(ms, new MyClass { Age = 30, FirstName = "John", LastName = "Doe"});
    ms.Position = 0;
    var newMyClass = (MyClass)serializer.Deserialize(ms);
}

[Serializable]
private class MyClass
{
    public String FirstName { get; set; }
    public String LastName;
    public Int32 Age { get; set; }
}

public class FixedWidthSerializer<T> : IFormatter
{
    private readonly FixedWidthFieldDefinition[] _fieldDefinition;

    public FixedWidthSerializer()
        : 
        this(FormatterServices.GetSerializableMembers(typeof(T)).Select(sm=>new FixedWidthFieldDefinition(sm.Name, 100)).ToArray())
    { }

    public FixedWidthSerializer(FixedWidthFieldDefinition[] fieldDefinition)
    {
        if (fieldDefinition == null) throw new ArgumentNullException("fieldDefinition");
        _fieldDefinition = fieldDefinition;
        Context = new StreamingContext(StreamingContextStates.All);            
    }

    public class FixedWidthFieldDefinition
    {
        public String FieldName { get; protected set; }
        public Int32 CharLength { get; protected set; }

        public FixedWidthFieldDefinition(String fieldName, Int32 charLength)
        {
            FieldName = fieldName;
            CharLength = charLength;
        }
    }

    public object Deserialize(Stream serializationStream)
    {
        var streamReader = new StreamReader(serializationStream);
        var textLine = streamReader.ReadLine();

        if (textLine == null)
            throw new SerializationException("Ran out of text!");

        var obj = FormatterServices.GetUninitializedObject(typeof (T));
        var memberDictionary = FormatterServices.GetSerializableMembers(obj.GetType(), Context).ToDictionary(mi => mi.Name);

        var offset = 0;
        foreach (var fieldDef in _fieldDefinition)
        {
            if (offset + fieldDef.CharLength > textLine.Length)
                throw new SerializationException("Line was too short!");

            // Read the current field and increase the offset
            var fieldStringValue = textLine.Substring(offset, fieldDef.CharLength);
            offset += fieldDef.CharLength;

            MemberInfo memberInfo;

            if (!memberDictionary.TryGetValue(fieldDef.FieldName, out memberInfo))
                throw new SerializationException("You asked for the member '" + fieldDef.FieldName + "', but it doesn't exist on type '" + typeof (T) + "'");

            var memberAsField = memberInfo as FieldInfo;

            if (memberAsField != null)
                memberAsField.SetValue(obj, Convert.ChangeType(fieldStringValue.TrimEnd(), memberAsField.FieldType));
            else
                throw new SerializationException("I don't know what to make of the property '" + fieldDef.FieldName + "'");
        }
        return obj;
    }

    public void Serialize(Stream serializationStream, object graph)
    {
        var serializableMembers = FormatterServices.GetSerializableMembers(graph.GetType());
        var membersToSerialize = _fieldDefinition.Select(fd => serializableMembers.First(sm => sm.Name == fd.FieldName)).ToArray();
        var objectData = FormatterServices.GetObjectData(graph, membersToSerialize);
        var sb = new StringBuilder(_fieldDefinition.Sum(fd => fd.CharLength));
        for (var i = 0; i < _fieldDefinition.Length; i++)
            sb.Append(((String) Convert.ChangeType(objectData[i], typeof (String))).PadRight(_fieldDefinition[i].CharLength), 0, _fieldDefinition[i].CharLength);
        var sw = new StreamWriter(serializationStream);
        sw.WriteLine(sb.ToString());
        sw.Flush();
    }

    public ISurrogateSelector SurrogateSelector { get; set; }

    public SerializationBinder Binder { get; set; }

    public StreamingContext Context { get; set; }
}