用Blaze附加bcolz列

时间:2015-07-21 18:56:08

标签: python hdf5 blaze

让我们先构建一个ctable

import pandas as pd
import blaze as bl

df = pd.DataFrame({'x': range(4), 'y': [2., 4., 2., 4.]})
bl.odo(df, 'test.bcolz')

现在假设我要添加一个名为' x_mod'的列。到这张桌子。我试过了

test_table = bl.Data('test.bcolz')

def f(h):
    return h*3
test_table['x_mod'] = test_table['x'].apply(f, dshape='int64')
#Or, I think equivalently:
#test_table['x_mod'] = test_table['x']*3

但它给出了

TypeError: 'InteractiveSymbol' object does not support item assignment

1)如何分配' x_mod'列然后保存到磁盘? 我正在处理大型数据库:计算内存中的列应该没问题,但是我无法在内存中加载整个ctable

2)在相关问题上,apply对我来说也不起作用。我做错了吗?

#This doesn't work:
bl.compute(test_table['x'].apply(f, dshape='int64'))

#This I think should be equivalent, but does work:
bl.compute(test_table['x']*3)

谢谢你的时间!

1 个答案:

答案 0 :(得分:1)

You can use the transform method in Blaze like this:

bz.transform(df, sepal_ratio = df.sepal_length / df.sepal_width   )

For other function, you need to use Blaze expression:

bz.transform(df, sepal_ratio = BLAZE_symbolic_Expression(df.Col1, df.col2)  )

it will add the compute column to the dataframe. Doc is here: https://blaze.readthedocs.io/en/latest/expressions.html

For example, you can use map:

from datetime import datetime
yourexpr = df.col1.map(datetime.utcfromtimestamp)
bz.transform(df, sepal_ratio=yourexpr)