我有一个df(stock_pairs
)列出了股票之间的差价交易。它有2列,一列表示买入的股票,另一列表示卖出的股票。
buy sell
0 MSFT MXIM
1 INTC MXIM
2 AMZN MXIM
3 NFLX MXIM
4 BIIB MXIM
5 GILD MXIM
6 TEVA MXIM
7 GDXJ MXIM
8 SLAB MXIM
9 NXPI MXIM
stock_pairs.to_dict()
的输出:
{'buy': {0: 'MSFT',
1: 'INTC',
2: 'AMZN',
3: 'NFLX',
4: 'BIIB',
5: 'GILD',
6: 'TEVA',
7: 'GDXJ',
8: 'SLAB',
9: 'NXPI'},
'sell': {0: 'MXIM',
1: 'MXIM',
2: 'MXIM',
3: 'MXIM',
4: 'MXIM',
5: 'MXIM',
6: 'MXIM',
7: 'MXIM',
8: 'MXIM',
9: 'MXIM'}}
我有另一个数据框,其中包含我所在股票中所有股票的股票价格信息。
stock_price_df
看起来像:
Stock dt Price
0 MSFT 2015-12-31 -562.14
1 MSFT 2016-01-31 -701.18
2 MSFT 2016-02-29 -265.44
3 MSFT 2016-03-31 -42.62
4 MSFT 2016-04-30 -468.95
5 MSFT 2016-05-31 -549.94
6 MSFT 2016-06-30 80.84
7 MSFT 2016-07-31 -633.36
8 MSFT 2016-08-31 -1700.73
9 MSFT 2016-09-30 -229.40
10 MSFT 2016-10-31 996.27
11 MSFT 2016-11-30 117.01
12 MXIM 2015-12-31 56.44
13 MXIM 2016-01-31 -83.38
14 MXIM 2016-02-29 152.92
15 MXIM 2016-03-31 -48.93
16 MXIM 2016-04-30 387.37
17 MXIM 2016-05-31 -194.31
18 MXIM 2016-06-30 -332.07
19 MXIM 2016-07-31 303.43
20 MXIM 2016-08-31 55.33
21 MXIM 2016-09-30 -170.31
22 MXIM 2016-10-31 -411.65
23 MXIM 2016-11-30 -101.52
stock_price_df.to_dict()
的输出:
{'Stock': {0: 'MSFT',
1: 'MSFT',
2: 'MSFT',
3: 'MSFT',
4: 'MSFT',
5: 'MSFT',
6: 'MSFT',
7: 'MSFT',
8: 'MSFT',
9: 'MSFT',
10: 'MSFT',
11: 'MSFT',
10440: 'MXIM ',
10441: 'MXIM ',
10442: 'MXIM ',
10443: 'MXIM ',
10444: 'MXIM ',
10445: 'MXIM ',
10446: 'MXIM ',
10447: 'MXIM ',
10448: 'MXIM ',
10449: 'MXIM ',
10450: 'MXIM ',
10451: 'MXIM '},
'dt': {0: Timestamp('2015-12-31 00:00:00'),
1: Timestamp('2016-01-31 00:00:00'),
2: Timestamp('2016-02-29 00:00:00'),
3: Timestamp('2016-03-31 00:00:00'),
4: Timestamp('2016-04-30 00:00:00'),
5: Timestamp('2016-05-31 00:00:00'),
6: Timestamp('2016-06-30 00:00:00'),
7: Timestamp('2016-07-31 00:00:00'),
8: Timestamp('2016-08-31 00:00:00'),
9: Timestamp('2016-09-30 00:00:00'),
10: Timestamp('2016-10-31 00:00:00'),
11: Timestamp('2016-11-30 00:00:00'),
12: Timestamp('2015-12-31 00:00:00'),
13: Timestamp('2016-01-31 00:00:00'),
14: Timestamp('2016-02-29 00:00:00'),
15: Timestamp('2016-03-31 00:00:00'),
16: Timestamp('2016-04-30 00:00:00'),
17: Timestamp('2016-05-31 00:00:00'),
18: Timestamp('2016-06-30 00:00:00'),
19: Timestamp('2016-07-31 00:00:00'),
20: Timestamp('2016-08-31 00:00:00'),
21: Timestamp('2016-09-30 00:00:00'),
22: Timestamp('2016-10-31 00:00:00'),
23: Timestamp('2016-11-30 00:00:00')},
'Price': {0: -562.13999999999999,
1: -701.18000000000029,
2: -265.43999999999994,
3: -42.620000000000012,
4: -468.9500000000001,
5: -549.94000000000005,
6: 80.840000000000032,
7: -633.36000000000013,
8: -1700.7300000000002,
9: -229.40000000000006,
10: 996.26999999999998,
11: 117.01000000000001,
12: 56.439999999999998,
13: -83.380000000000024,
14: 152.91999999999996,
15: -48.929999999999993,
16: 387.37,
17: -194.30999999999997,
18: -332.07000000000011,
19: 303.43000000000001,
20: 55.330000000000013,
21: -170.31,
22: -411.64999999999998,
23: -101.52}}
我有一个名为cal_stats_align_data
的函数,运行方式如下:
A) stock_pair_datadump = stock_pairs.apply(cal_stats_align_data, axis=1, args=(stock_price_df))
它也可以像:
一样运行B) stock_pair_datadump = cal_stats_align_data(stock_pairs.iloc[0], stock_price_df)
A#
执行stock_pairs数据框中所有股票对的操作,而B#
中的执行只执行一对。
函数cal_stats_align_data
每对返回1行x 20列统计信息。
因此,输出基本上与stock_pairs中的行数相同,但数据列数为10列。
B#
的执行工作正常。但是,当我尝试执行A#
时(即在整个stock_pair Universe中),我收到以下错误:
ValueError: cannot copy sequence with size 20 to array axis with dimension 1
更多详情:
--------------------------------------------------------------------------
ValueError Traceback (most recent call last)
C:\Users\blahblah\Anaconda3\lib\site-packages\pandas\core\common.py in _asarray_tuplesafe(values, dtype)
1403 result = np.empty(len(values), dtype=object)
-> 1404 result[:] = values
1405 except ValueError:
ValueError: could not broadcast input array from shape (20) into shape (1)
During handling of the above exception, another exception occurred:
ValueError Traceback (most recent call last)
<ipython-input-1427-1e0b85417edf> in <module>()
----> 1 miso_path_datadump = path_master_filtered[['src','snk']][0:2].apply(cal_stats_align_df, axis=1, args=(mcc_filtered_final, cost_filtered_final, dt.datetime(2017,1,1), 'Peak',0))
C:\Users\blahblah\Anaconda3\lib\site-packages\pandas\core\frame.py in apply(self, func, axis, broadcast, raw, reduce, args, **kwds)
4059 if reduce is None:
4060 reduce = True
-> 4061 return self._apply_standard(f, axis, reduce=reduce)
4062 else:
4063 return self._apply_broadcast(f, axis)
C:\Users\blahblah\Anaconda3\lib\site-packages\pandas\core\frame.py in _apply_standard(self, func, axis, ignore_failures, reduce)
4172 index = None
4173
-> 4174 result = self._constructor(data=results, index=index)
4175 result.columns = res_index
4176
C:\Users\blahblah\Anaconda3\lib\site-packages\pandas\core\frame.py in __init__(self, data, index, columns, dtype, copy)
222 dtype=dtype, copy=copy)
223 elif isinstance(data, dict):
--> 224 mgr = self._init_dict(data, index, columns, dtype=dtype)
225 elif isinstance(data, ma.MaskedArray):
226 import numpy.ma.mrecords as mrecords
C:\Users\blahblah\Anaconda3\lib\site-packages\pandas\core\frame.py in _init_dict(self, data, index, columns, dtype)
358 arrays = [data[k] for k in keys]
359
--> 360 return _arrays_to_mgr(arrays, data_names, index, columns, dtype=dtype)
361
362 def _init_ndarray(self, values, index, columns, dtype=None, copy=False):
C:\Users\blahblah\Anaconda3\lib\site-packages\pandas\core\frame.py in _arrays_to_mgr(arrays, arr_names, index, columns, dtype)
5234
5235 # don't force copy because getting jammed in an ndarray anyway
-> 5236 arrays = _homogenize(arrays, index, dtype)
5237
5238 # from BlockManager perspective
C:\Users\blahblah\Anaconda3\lib\site-packages\pandas\core\frame.py in _homogenize(data, index, dtype)
5544 v = lib.fast_multiget(v, oindex.values, default=NA)
5545 v = _sanitize_array(v, index, dtype=dtype, copy=False,
-> 5546 raise_cast_failure=False)
5547
5548 homogenized.append(v)
C:\Users\blahblah\Anaconda3\lib\site-packages\pandas\core\series.py in _sanitize_array(data, index, dtype, copy, raise_cast_failure)
2920 raise Exception('Data must be 1-dimensional')
2921 else:
-> 2922 subarr = _asarray_tuplesafe(data, dtype=dtype)
2923
2924 # This is to prevent mixed-type Series getting all casted to
C:\Users\blahblah\Anaconda3\lib\site-packages\pandas\core\common.py in _asarray_tuplesafe(values, dtype)
1405 except ValueError:
1406 # we have a list-of-list
-> 1407 result[:] = [tuple(x) for x in values]
1408
1409 return result
ValueError: cannot copy sequence with size 20 to array axis with dimension 1
有什么想法可以解决吗?
谢谢。