我是Python的新手,并尝试创建与Matlab的“单元阵列”相当的东西。假设我有100个客户索引“C001”,“C002”等,我为每个客户提供不同的数据:
在Python 2.7中构建数据集的最佳方法是什么,它结合了单值,分类数据和时间索引数组?我正在尝试使用大熊猫,但到目前为止还没有成功。
非常感谢您提前
答案 0 :(得分:12)
MATLAB单元阵列的等价物是一个numpy对象数组。但是,这些很少使用,因为它们很少在实践中出现。在大多数情况下,有人会在MATLAB中使用Cell,列表或嵌套列表就足够了:
>>> a = [obj1, obj2, obj, obj4]
>>> b = [[obj1, obj2], [obj3, obj4]]
但是,这不是您想要做的事情。您的问题是X Y problem的典型示例。您正在询问如何为您的问题实施特定的解决方案,而不是询问如何解决问题本身。 Python可以做很多MATLAB无法做到的事情,所以试图让Python像MATLAB一样运行通常会导致次优解决方案。
在这种情况下,你想要的是pandas DataFrame。它完全不像MATLAB单元阵列,但更适合您的数据集。您可以使用MultiIndex来存储参数,使用列来存储时间序列数据。这允许您按名称,大小,类别,日期等进行索引。例如,您可以计算第三季度每类物业的平均能源使用量,超过500平方米的物业只需一行代码。< / p>
以下是如何构建此类数据的示例:
>>> names = ['C001', 'C002', 'C003', 'C004']
>>> sizes = np.abs(np.random.random(4))*1000
>>> category = ['Commerical', 'Residential', 'Residential', 'Other']
>>> ts = np.random.random([100, 4])
>>> timestamps = pd.date_range('1/1/2011', periods=100, freq='W')
>>>
>>> cols = pd.MultiIndex.from_arrays([names, sizes, category])
>>>
>>> df = pd.DataFrame(ts, index=timestamps, columns=cols)
>>> df.columns.names = ['Name', 'Size', 'Category']
>>> df.index.name = 'Time'
>>>
>>> print(df)
Name C001 C002 C003 C004
Size 36.719201 732.278278 795.755755 551.383120
Category Commerical Residential Residential Other
Time
2011-01-02 0.108720 0.018492 0.057233 0.694548
2011-01-09 0.959845 0.968857 0.422210 0.975767
2011-01-16 0.709676 0.119963 0.004481 0.830328
2011-01-23 0.084271 0.535408 0.209943 0.668001
2011-01-30 0.626125 0.052301 0.212636 0.995429
2011-02-06 0.376399 0.199327 0.482884 0.632472
2011-02-13 0.302807 0.353679 0.599427 0.993996
2011-02-20 0.185445 0.005769 0.755981 0.923540
2011-02-27 0.109611 0.994292 0.873782 0.542741
2011-03-06 0.561404 0.778414 0.595238 0.082001
2011-03-13 0.056986 0.869344 0.459753 0.450071
2011-03-20 0.261320 0.675317 0.603043 0.371950
2011-03-27 0.890803 0.061619 0.831677 0.801890
2011-04-03 0.498199 0.846559 0.370336 0.225477
2011-04-10 0.248914 0.693038 0.145255 0.233058
2011-04-17 0.621441 0.683213 0.048944 0.650139
2011-04-24 0.459869 0.055751 0.912097 0.457605
2011-05-01 0.814447 0.780415 0.184241 0.429139
2011-05-08 0.586905 0.209121 0.428080 0.246584
2011-05-15 0.754021 0.909181 0.846984 0.948835
2011-05-22 0.513610 0.203925 0.338072 0.596325
2011-05-29 0.497080 0.557908 0.916812 0.680242
2011-06-05 0.646791 0.641024 0.399427 0.308346
2011-06-12 0.573922 0.539285 0.098703 0.461480
2011-06-19 0.062978 0.939339 0.713087 0.380326
2011-06-26 0.422484 0.109185 0.459734 0.800468
2011-07-03 0.962368 0.632361 0.388565 0.503425
2011-07-10 0.802551 0.261161 0.590494 0.526307
2011-07-17 0.261447 0.686405 0.636970 0.622476
2011-07-24 0.634331 0.630028 0.069925 0.504036
... ... ... ... ...
2012-05-06 0.185331 0.375717 0.658463 0.697377
2012-05-13 0.273510 0.665318 0.756944 0.083542
2012-05-20 0.895984 0.850881 0.680869 0.987420
2012-05-27 0.450593 0.262195 0.458893 0.199141
2012-06-03 0.696102 0.332312 0.419764 0.338074
2012-06-10 0.113108 0.167605 0.812625 0.329429
2012-06-17 0.527418 0.087454 0.868973 0.744649
2012-06-24 0.977674 0.831538 0.410719 0.598423
2012-07-01 0.577802 0.141307 0.310356 0.276271
2012-07-08 0.772117 0.288240 0.820701 0.548857
2012-07-15 0.699628 0.467952 0.429433 0.304482
2012-07-22 0.782641 0.337854 0.561191 0.572241
2012-07-29 0.010225 0.962770 0.793041 0.166877
2012-08-05 0.895516 0.628526 0.782264 0.908301
2012-08-12 0.787210 0.698185 0.255306 0.741693
2012-08-19 0.042833 0.556469 0.165885 0.408108
2012-08-26 0.942076 0.377714 0.927170 0.119004
2012-09-02 0.567978 0.007891 0.777752 0.869950
2012-09-09 0.120134 0.417996 0.328654 0.484447
2012-09-16 0.833769 0.946456 0.594471 0.569707
2012-09-23 0.515544 0.090017 0.344200 0.498175
2012-09-30 0.419152 0.315412 0.683195 0.498630
2012-10-07 0.879582 0.958591 0.531812 0.051948
2012-10-14 0.488241 0.683242 0.096560 0.197295
2012-10-21 0.425213 0.279539 0.476436 0.492512
2012-10-28 0.238334 0.836782 0.901589 0.132700
2012-11-04 0.030562 0.797666 0.238895 0.550427
2012-11-11 0.875454 0.973046 0.457116 0.154175
2012-11-18 0.557967 0.895320 0.478239 0.448102
2012-11-25 0.075152 0.047344 0.650615 0.293129
[100 rows x 4 columns]