如何从文件中剪切数据?

时间:2018-12-06 17:05:40

标签: c#

我有一个140MB左右的文件,它在一段时间内包含一些CAN数据,总持续时间约为29:17:00 [mm:ss:ms]。 我需要的是拆分该文件,或者更好地将特定时间段内的一些数据复制到一个新文件中。

就像我们说从时间10:00:0020:30:00

有什么想法吗?如何处理?

到目前为止,我所做的是阅读标题:

private void test(string fileName)
{
    FileStream fs;

    fs = File.OpenRead(fileName);
    long fileSize = fs.Length;
    bool extendedFileFormat = DriveRecFiles.IsFileDRX(replayCtrl.SourceFilename);

    Int64 tmpByte = 0;
    Int64 tmpInt64 = 0;

    #region TimeStampFrequency
    for (int i = 0; i < 8; i++)
    {
        tmpByte = fs.ReadByte();
        tmpInt64 += tmpByte << i * 8;
    }
    SourceTimingClockFrequency = tmpInt64;
    #endregion

    #region  StarTimeStamp                
    tmpInt64 = 0;
    for (int i = 0; i < 8; i++)
    {
        tmpByte = fs.ReadByte();
        tmpInt64 += tmpByte << i * 8;
    }
    sourceTimingBeginStampValue = tmpInt64;
    #endregion

    #region Last TimeStamp
    fs.Position = fs.Length - 8;
    tmpInt64 = 0;
    for (int i = 0; i < 8; i++)
    {
        tmpByte = fs.ReadByte();
        tmpInt64 += tmpByte << i * 8;
    }
    TimeStampEnd = tmpInt64;

    //This is the conversation from TimeStamp to Time in ms   
    int FileLengthTime = (int)((1000 * (TimeStampEnd - sourceTimingBeginStampValue)) / SourceTimingClockFrequency);
    #endregion

}

现在我很困惑,我不知道该如何处理,我应该通过每个时间戳与每个类似的for循环进行比较:

假设我设置了开始时间1000000ms和结束时间1700000ms

int begintime = 1000000
int endtime = 1700000
int startPosition = 0
int endPosition = 0
long currentTimeStepEnd = 0;
int currentTime = 0;
for (int i = 8; i <= fs.Length - 8 ; i++)
{
    fs.position = i
    tmpInt64 = 0;
    for(int i = 0; i < 8; i++)
    {
        tmpByte = fs.ReadByte();
        tmpInt64 += tmpByte << i * 8;
    }
    currentTimeStepEnd = tmpInt64;
    currentTime = (int)((1000 * (CurrentTimeStepEnd - sourceTimingBeginStampValue)) / SourceTimingClockFrequency);
    if(startPosition = 0) int start = currentTime.CompareTo(begintime)
    if(endPosition = 0) int end = currentTime.CompareTo(endtime)
    if (start == 0) startPosition = i;
    if (end == 0) endPosition = i
    if ((startPosition != 0) & (endPosition != 0)) break;
    i += 47;
}

然后将结果复制到文件中。

我不知道这是否是最好的方法。其次,我想以1ms为步长来制作开始时间的滑块和结束时间的滑块 我认为上述方法在每次将新的滑块值与当前时间戳等进行比较时效率不高。每次打开和关闭fs

1 个答案:

答案 0 :(得分:1)

这是答案的一部分。我可以逐块读取您的数据。一旦获取,就可以决定将其写回到一组较小的文件中(使用FileStreams上的BinaryWriters)。我留给你。但这会读取所有内容。

更新:下面还有更多答案(我添加了WriteStruct方法,并且更接近您的要求)

我首先定义两个结构非常清晰的结构。由于标头仅包含两个连续的64位元,因此我只能使用LayoutKind.Sequential

[StructLayout(LayoutKind.Sequential)]
public struct CanHeader {
    public UInt64 TimeStampFrequency;
    public UInt64 TimeStamp;
}

但是,Chunk结构会混合并匹配32位和64位uint。如果我按顺序进行布局,则框架将插入4个字节的填充以对齐UInt64。因此,我需要使用LayoutKind.Explicit

[StructLayout(LayoutKind.Explicit)]
public struct CanChunk {
    [FieldOffset(0)] public UInt32 ReturnReadValue;
    [FieldOffset(4)] public UInt32 CanTime;
    [FieldOffset(8)] public UInt32 Can;
    [FieldOffset(12)] public UInt32 Ident;
    [FieldOffset(16)] public UInt32 DataLength;
    [FieldOffset(20)] public UInt64 Data;
    [FieldOffset(28)] public UInt32 Res;
    [FieldOffset(32)] public UInt64 TimeStamp;
}

然后,我看了@FelixK对C# array within a struct的回答,并修改了他的ReadStruct扩展方法以适合我的需求:

private static (T, bool) ReadStruct<T>(this BinaryReader reader) where T : struct {
    var len = Marshal.SizeOf(typeof(T));
    Byte[] buffer = reader.ReadBytes(len);

    if (buffer.Length < len) {
        return (default(T), false);
    }
    //otherwise
    GCHandle handle = default(GCHandle);
    try {
        handle = GCHandle.Alloc(buffer, GCHandleType.Pinned);
        return ((T)Marshal.PtrToStructure(handle.AddrOfPinnedObject(), typeof(T)), true);
    } finally {
        if (handle.IsAllocated)
            handle.Free();
    }
}

它返回一个元组,其中第一个成员是刚从文件中读取的结构实例,第二个成员是指示是否需要更多读取的标志(true表示“保留读取”)。它还使用BinaryReader.ReadBytes,而不是BinaryReader.Read

所有这些都准备就绪,现在我可以读取数据了。我的第一个尝试是让我将内容写到控制台上-但是要写出140 MB的空间要花很多时间。但是,如果这样做,您将看到数据按预期的方式移动(时间戳不断增加)。

public static void ReadBinary() {
    using (var stream = new FileStream("Klassifikation_only_Sensor1_01.dr2", FileMode.Open, FileAccess.Read)) {
        using (var reader = new BinaryReader(stream)) {
            var headerTuple = reader.ReadStruct<CanHeader>();
            Console.WriteLine($"[Header] TimeStampFrequency: {headerTuple.Item1.TimeStampFrequency:x016}  TimeStamp: {headerTuple.Item1.TimeStamp:x016}");;
            bool stillWorking;
            UInt64 totalSize = 0L;
            var chunkSize = (UInt64)Marshal.SizeOf(typeof(CanChunk));
            do {
                var chunkTuple = reader.ReadStruct<CanChunk>();
                stillWorking = chunkTuple.Item2;
                if (stillWorking) {
                    var chunk = chunkTuple.Item1;
                    //Console.WriteLine($"{chunk.ReturnReadValue:x08} {chunk.CanTime:x08} {chunk.Can:x08} {chunk.Ident:x08} {chunk.DataLength:x08} {chunk.Data:x016} {chunk.Res:x04} {chunk.TimeStamp:x016}");
                    totalSize += chunkSize;
                }
            } while (stillWorking);
            Console.WriteLine($"Total Size: 0x{totalSize:x016}");
        }
    }
}

如果我取消注释Console.WriteLine语句,则输出开始如下:

[Header] TimeStampFrequency: 00000000003408e2  TimeStamp: 000002a1a1bf04bb
00000001 a1bf04bb 00000020 000002ff 00000008 0007316be2c20350 0000 000002a1a1bf04bb
00000001 a1bf04be 00000020 00000400 00000008 020a011abf80138e 0000 000002a1a1bf04be
00000001 a1bf04c0 00000020 00000400 00000008 8000115f84f09f12 0000 000002a1a1bf04c0
00000001 a1bf04c2 00000020 00000401 00000008 0c1c1205690d81f8 0000 000002a1a1bf04c2
00000001 a1bf04c3 00000020 00000401 00000007 001fa2420000624d 0000 000002a1a1bf04c3
00000001 a1bf04c5 00000020 00000402 00000008 0c2a5a95b99d0286 0000 000002a1a1bf04c5
00000001 a1bf04c7 00000020 00000402 00000007 001faa6000003c49 0000 000002a1a1bf04c7
00000001 a1bf04c8 00000020 00000403 00000008 0c1c0c06840e02d2 0000 000002a1a1bf04c8
00000001 a1bf04ca 00000020 00000403 00000007 001fad4200006c5d 0000 000002a1a1bf04ca
00000001 a1bf04cc 00000020 00000404 00000008 0c1c0882800b82d8 0000 000002a1a1bf04cc
00000001 a1bf04cd 00000020 00000404 00000007 001fad8200009cd1 0000 000002a1a1bf04cd
00000001 a1bf04cf 00000020 00000405 00000008 0c1c0f04850cc2de 0000 000002a1a1bf04cf
00000001 a1bf04d0 00000020 00000405 00000007 001fada20000766f 0000 000002a1a1bf04d0
00000001 a1bf04d2 00000020 00000406 00000008 0c1bd80c4e13831a 0000 000002a1a1bf04d2
00000001 a1bf04d3 00000020 00000406 00000007 001faf800000505b 0000 000002a1a1bf04d3
00000001 a1bf04d5 00000020 00000407 00000008 0c23d51049974330 0000 000002a1a1bf04d5
00000001 a1bf04d6 00000020 00000407 00000007 001fb02000004873 0000 000002a1a1bf04d6
00000001 a1bf04d8 00000020 00000408 00000008 0c1c0a8490cc44ba 0000 000002a1a1bf04d8
00000001 a1bf04da 00000020 00000408 00000007 001fb762000088bf 0000 000002a1a1bf04da
00000001 a1bf04db 00000020 00000409 00000008 0c1c0603a0cbc4c0 0000 000002a1a1bf04db
00000001 a1bf04df 00000020 00000409 00000007 001fb76000008ee5 0000 000002a1a1bf04df
00000001 a1bf04e0 00000020 0000040a 00000008 0c23f70c5b9544cc 0000 000002a1a1bf04e0
00000001 a1bf04e2 00000020 0000040a 00000007 001fb7820000565f 0000 000002a1a1bf04e2
00000001 a1bf04e3 00000020 0000040b 00000008 0c1bf3049b4cc502 0000 000002a1a1bf04e3
00000001 a1bf04e5 00000020 0000040b 00000007 001fb82200007eab 0000 000002a1a1bf04e5

最后完成:

Total Size: 0x00000000085ae0a8

该十进制数字为140,173,480。那就是我所期望的。

更新

为了更接近您的要求,我使用了ReadStruct方法中的代码,并使用它来创建了一个相应的WriteStruct方法:

 private static void WriteStruct<T>(this BinaryWriter writer, T obj) where T : struct {
     var len = Marshal.SizeOf(typeof(T));
     var buffer = new byte[len];

     GCHandle handle = default(GCHandle);
     try {
         handle = GCHandle.Alloc(buffer, GCHandleType.Pinned);
         Marshal.StructureToPtr(obj, handle.AddrOfPinnedObject(), false);
     } finally {
         if (handle.IsAllocated)
             handle.Free();
     }
     writer.Write(buffer);
 }

这样,我还可以修改原始代码以读取所有数据,并将选择性部分写到另一个文件中。在下面的代码中,我读取了“块”,直到块上的时间戳可被10,000整除。一旦发生这种情况,我将创建一个新的CanHeader结构(我不确定该去哪里,但您应该去)。然后,我创建一个输出FileStream(即要写入的文件)和一个BinaryWriter。我将标头写入FileSteam,然后将读取的下5000个块写入该文件。您可以使用块流中的数据来决定要执行的操作:

    using (var readStream = new FileStream("Klassifikation_only_Sensor1_01.dr2", FileMode.Open, FileAccess.Read)) {
        using (var reader = new BinaryReader(readStream)) {
            var headerTuple = reader.ReadStruct<CanHeader>();
            Console.WriteLine($"[Header] TimeStampFrequency: {headerTuple.Item1.TimeStampFrequency:x016}  TimeStamp: {headerTuple.Item1.TimeStamp:x016}"); ;
            bool stillWorking;
            UInt64 totalSize = 0L;
            UInt64 recordCount = 0L;
            var chunkSize = (UInt64)Marshal.SizeOf(typeof(CanChunk));
            var chunksWritten = 0;
            FileStream writeStream = null;
            BinaryWriter writer = null;
            var writingChucks = false;
            var allDone = false;
            try {
                do {
                    var chunkTuple = reader.ReadStruct<CanChunk>();
                    stillWorking = chunkTuple.Item2;
                    if (stillWorking) {
                        var chunk = chunkTuple.Item1;
                        if (!writingChucks && chunk.CanTime % 10_000 == 0) {
                            writingChucks = true;
                            var writeHeader = new CanHeader {
                                TimeStamp = chunk.TimeStamp,
                                TimeStampFrequency = headerTuple.Item1.TimeStampFrequency
                            };
                            writeStream = new FileStream("Output.dr2", FileMode.Create, FileAccess.Write);
                            writer = new BinaryWriter(writeStream);
                            writer.WriteStruct(writeHeader);
                        }
                        if (writingChucks && !allDone) {
                            writer.WriteStruct(chunk);
                            ++chunksWritten;
                            if (chunksWritten >= 5000) {
                                allDone = true;
                            }
                        }
                        totalSize += chunkSize;
                        ++recordCount;
                    }
                } while (stillWorking);
            } finally {
                writer?.Dispose();
                writeStream?.Dispose();
            }
            Console.WriteLine($"Total Size: 0x{totalSize:x016}  Record Count: {recordCount}  Records Written: {chunksWritten}");
        }
    }
}

完成后,我可以看到将5000条记录写入文件(长度为200,016字节-5000条40字节记录以16字节标题开头),并且第一条记录的CanTime为0xa3a130d0(或2,745,250,000) -即可以被10,000整除。我期待的一切。