使用python逐行读取Azure存储Blob中的文本文件

时间:2019-05-03 11:16:36

标签: python-3.x azure-storage azure-storage-blobs

我需要逐行从blob存储中读取文本文件,并执行一些操作,并在数据帧中添加特定行。我尝试了各种方式逐行读取文件。有什么方法可以从blob line-line读取文本文件,并执行操作并输出特定行,就像readlines()一样,而数据仍在本地存储中?

candidate_resume = 'candidateresumetext'
block_blob_service = BlockBlobService(account_name='nam', account_key='key')
generator2 = block_blob_service.list_blobs(candidate_resume)
#for blob in generator2:
   #print(blob.name)
for blob in generator2:
    blob2 = block_blob_service.get_blob_to_text(candidate_resume,blob.name)
    #print(blob2)

    #blob_url=block_blob_service.make_blob_url(candidate_resume, blob.name)
    #print(blob_url)

    #blob3 = block_blob_service.get_blob_to_stream(candidate_resume,blob.name,range)
    blob3 = blob2.split('.')
    with open(blob.name,encoding = 'utf-8') as file:
        lines = file.readlines()
        for line in blob3:      
            if any(p in years_list for p in line ):
                if any(p in months_list for p in line):    
                    print(line)

1 个答案:

答案 0 :(得分:0)

方法get_blob_to_text是正确的方法,您可以按照下面的示例代码进行(如果不满足需要,可以进行一些更改)。而且您无法使用with open() as file,因为那里没有真实的文件。

#read the content of the blob(assume it's a .txt file)
str1 = block_blob_service.get_blob_to_text(container_name,blob_name)

#split the string str1 with newline.
arr1 = str1.content.splitlines()

#read the one line each time.
for a1 in arr1:
    print(a1)