我正在尝试在Python中处理来自hive的镶木桌并面临一些数据类型问题。例如,如果我的蜂巢木地板中有一个字段
app.factory('bluetoothFactory', function($http) {
function Test() {
var context = this;
this.data= [];
this.connectedDeviceGet= function() {
return this.data;
};
this.connectedDeviceSet= function(data) {
this.data = data;
};
}
//get instance
var self;
function get() {
if (self) {
return self;
} else {
var self = new Test();
return self;
}
}
return {
get: get
};
});
//can access like this.
app.controller('testCtrl',function(bluetoothFactory){
var service = bluetoothFactory.get();
service.connectedDeviceSet([1,2]);
});
,当我尝试在python中读取文件时,它给出了一个垃圾值。
请提供一些意见。
答案 0 :(得分:0)
我认为这可能会有所帮助,尽管这不是一个正确的答案。例如,在存储到Parquet之前,我已经在PySpark作业中使用了此方法,例如,将小数转换为浮点数,以便它们在Pandas DataFrames中读取为OK。在这种情况下,我会缩小类型,但您会明白:
def shrink_types(df):
"""Reduce data size by shrinking the types"""
# Loop through the data type tuples and downcast the column
for t in df.dtypes:
column_name = t[0]
column_type = t[1]
if column_type == 'double' or 'decimal' in column_type:
df = df.withColumn(
column_name,
F.col(column_name).cast('float')
)
return df
然后我通过以下方式调用它:
equities_df = shrink_types(equities_df)
# Save and restore so it actually runs
equities_df.write.mode('overwrite').parquet(
path='s3://bucket/path/dataset.parquet',
)