C#:从ADLS gen2 blob下载大型json文件并反序列化为对象

时间:2019-12-11 00:48:53

标签: c# .net azure .net-core azure-data-lake

我正在使用以下代码将数据从blob输出到流。

    private static async Task<Stream> ParallelDownloadBlobAsync(Stream outPutStream, CloudBlockBlob blob)
    {

        await blob.FetchAttributesAsync();
        int bufferLength = 1 * 1024 * 1024;//1 MB chunk
        long blobRemainingLength = blob.Properties.Length;
        Queue<KeyValuePair<long, long>> queues = new Queue<KeyValuePair<long, long>>();
        long offset = 0;
        while (blobRemainingLength > 0)
        {
            long chunkLength = (long)Math.Min(bufferLength, blobRemainingLength);
            queues.Enqueue(new KeyValuePair<long, long>(offset, chunkLength));
            offset += chunkLength;
            blobRemainingLength -= chunkLength;
        }
        Parallel.ForEach(queues, new ParallelOptions()
        {
            //Gets or sets the maximum number of concurrent tasks
            MaxDegreeOfParallelism = 10
        }, (queue) =>
        {
            using (var ms = new MemoryStream())
            {
                blob.DownloadRangeToStreamAsync(ms, queue.Key, queue.Value);
                lock (outPutStream)
                {
                    outPutStream.Position = queue.Key;
                    var bytes = ms.ToArray();
                    outPutStream.Write(bytes, 0, bytes.Length);
                }
            }
        });

        return outPutStream;
    }

然后我使用了JsonSerializer对数据进行反序列化,但是当块未执行时

 await ParallelDownloadBlobAsync(stream, cloudBlockBlob);

                //resetting stream's position to 0

                //stream.Position = 0;
                var serializer = new JsonSerializer();

                    using (var sr = new StreamReader(stream))
                    {
                        using (var jsonTextReader = new JsonTextReader(sr))
                        {
                            jsonTextReader.SupportMultipleContent = true;
                            result = new List<T>();


                            while (jsonTextReader.Read())
                            {
                                result.Add(serializer.Deserialize<T>(jsonTextReader));
                            }

                        }
                    }
  

如果我使用 DownloadToStreamAsync 而不是并行下载( DownloadRangeToStreamAsync ),则   可以。

1 个答案:

答案 0 :(得分:0)

我可以为您解决问题,这里的解决方案是,在ParallelDownloadBlobAsync方法中,将这行代码blob.DownloadRangeToStreamAsync(ms, queue.Key, queue.Value);更改为blob.DownloadRangeToStream(ms, queue.Key, queue.Value);

不确定您和我的问题是否是相同的根本原因。在我这方面,根本原因是,当文件较小(如100kb)时,使用blob.DownloadRangeToStreamAsync方法时,输出流始终为0,因此永远不会执行while condition。但是对于较大的文件,可以使用blob.DownloadRangeToStreamAsync方法。

如果无法解决您的问题,请发表评论。