Question

我的目标是按1列对数据帧进行排序，并尽可能有效地返回json对象。

为重复使用，请定义以下数据框：

function cleanText(){
   setTimeout(function() { 
   var txt1 = document.getElementById('<%= TextBox1.ClientID %>');
   txt1.value = "";}, 5000);
}

我需要做的是按列import pandas as pd import numpy as np test = pd.DataFrame(data={'a':[np.random.randint(0,100) for i in range(10000)], 'b':[i + np.random.randint(0,100) for i in range(10000)]}) a b 0 74 89 1 55 52 2 53 39 3 26 21 4 69 34排序，然后将输出编码为json对象。我正在采用基本方法并进行以下操作：

成本nlogn + n * 4是多少？有没有更有效的方法呢？

Answer 1

我注意到，熊猫读写JSON的速度比纯Python慢。如果您确定只有两列，可以执行以下操作：

data = [{'id' : x, 'data' : {'a' : y, 'b' : z}} 
            for x, (y, z) in zip(test.index, test.values.tolist())] 
json.dumps(data)

如果您有更多要担心的列，可以执行以下操作：

c = test.columns
data = [{'id' : x, 'data' : dict(zip(c, y))} 
            for x, *y in zip(test.index, test.values.tolist())]
json.dumps(data)

或者，如果可以处理，请在保存之前进行reset_index调用：

c = test.columns
data = [{'id' : x[0], 'data' : dict(zip(c, x[1:]))} 
            for x in test.reset_index().values.tolist()]
json.dumps(data)

优化熊猫数据框到JSON的成本

1 个答案: