我正在尝试从Amazon S3读取JSON文件,并且其文件大小约为2GB。当我使用方法.read()
时,它给了我MemoryError
。
这个问题有解决方案吗?任何帮助都可以,非常感谢!
答案 0 :(得分:2)
因此,我找到了一种有效地为我工作的方法。我有1.60 GB的文件,需要加载以进行处理。
s3 = boto3.client('s3', aws_access_key_id=<aws_access_key_id>, aws_secret_access_key=<aws_secret_access_key>)
# Now we collected data in the form of bytes array.
data_in_bytes = s3.Object(bucket_name, filename).get()['Body'].read()
#Decode it in 'utf-8' format
decoded_data = data_in_bytes.decode('utf-8')
#I used io module for creating a StringIO object.
stringio_data = io.StringIO(decoded_data)
#Now just read the StringIO obj line by line.
data = stringio_data.readlines()
#Its time to use json module now.
json_data = list(map(json.loads, data))
因此json_data
是文件的内容。我知道有很多变量操作,但是对我有用。
答案 1 :(得分:1)
只需遍历对象即可。
s3 = boto3.client('s3', aws_access_key_id=<aws_access_key_id>, aws_secret_access_key=<aws_secret_access_key>)
fileObj = s3.get_object(Bucket='bucket_name', Key='key')
for row in fileObj["body"]:
line = row.decode('utf-8')
print(json.loads(line))
答案 2 :(得分:0)
我刚刚解决了这个问题。这是代码。希望对将来有帮助!
s3 = boto3.client('s3', aws_access_key_id=<aws_access_key_id>, aws_secret_access_key=<aws_secret_access_key>)
obj = s3.get_object(Bucket='bucket_name', Key='key')
data = (line.decode('utf-8') for line in obj['Body'].iter_lines())
for row in file_content:
print(json.loads(row))
答案 3 :(得分:0)
IF Comparison = 'GroupA vs. GroupB' < .05 THEN
DO;
SUBGROUP = GROUPA;
CI_DIFF = CATX(CI_DIFF, ^{super 2,3};
END;