如何一次分配数据帧的每个元素?

时间:2017-10-19 08:15:37

标签: python pandas dataframe indexing vectorization

我有OG_df

           Symbol Order  Shares
Date                           
2011-01-10   AAPL   BUY    1500
2011-01-13   AAPL  SELL    1500
2011-01-13    IBM   BUY    4000
2011-01-26   GOOG   BUY    1000
2011-02-02    XOM  SELL    4000
2011-02-10    XOM   BUY    4000
2011-03-03   GOOG  SELL    1000
2011-03-03   GOOG  SELL    2200
2011-05-03    IBM   BUY    1500
2011-06-03    IBM  SELL    3300
2011-06-10   AAPL   BUY    1200
2011-08-01   GOOG   BUY      55
2011-08-01   GOOG  SELL      55
2011-12-20   AAPL  SELL    1200
2011-12-21   AAPL   BUY      20
2011-12-27   GOOG   BUY    2200
2011-12-28    IBM  SELL    2200

我也有df_prices

          AAPL     IBM    GOOG    XOM     SPY  CASH
2011-01-10  340.99  143.41  614.21  72.02  123.19   1.0
2011-01-11  340.18  143.06  616.01  72.56  123.63   1.0
...            ...     ...     ...    ...     ...   ...
2011-11-15  387.17  186.44  616.56  77.62  124.10   1.0
2011-11-16  383.13  184.33  611.47  76.79  122.13   1.0
2011-11-17  375.80  183.45  600.87  76.41  120.19   1.0
2011-11-18  373.34  182.97  594.88  76.45  120.06   1.0
2011-11-21  367.43  179.26  580.94  75.48  117.78   1.0
2011-11-22  374.90  179.09  580.00  74.61  117.31   1.0
[245 rows x 6 columns]

我设置date_range = pd.date_range(OG_df.index.min(), OG_df.index.max())然后

df1 = pd.DataFrame(0, df_prices.index, columns=list(df_prices))

假设你有vals = df1.values

[[0 0 0 0 0 0]
 [0 0 0 0 0 0]
 [0 0 0 0 0 0]
 ..., 
 [0 0 0 0 0 0]
 [0 0 0 0 0 0]
 [0 0 0 0 0 0]]

形状(245, 6)

我也可以

cols = np.array([df1.columns.get_loc(c) for c in OG_df.Symbol])

cols返回[0 0 1 2 3 3 2 2 1 1 0 2 2 0 0 2 1]

OG_df.Symbol['AAPL' 'IBM' 'GOOG' 'XOM'],正如您所看到的,OG_df中有17个不同的行有4个不同的列。

我也有

rows = np.arange(len(df1))

我想做vals[rows, cols] = some_variable之类的事情但返回:

IndexError: shape mismatch: indexing arrays could not be broadcast together with shapes (245,) (17,) 

因为rows的长度为17,而cols的长度为245

我希望根据df1填充some_variable中的每个单元格(每次都不同)。

order = np.where(orders_df.Order.values == 'BUY', -1, 1)

some_variable = OG_df.Shares.values * order

len(some_variable) = 17

我该怎么做?

另外,我不想将some_variable分配给CASH的{​​{1}}。

示例输出:

df1

1 个答案:

答案 0 :(得分:2)

我认为您正在尝试重新创建pivot_table,reindex。即

df = OG_df.copy()

df['Shares'] = np.where(df['Order'] == 'BUY',df['Shares']*-1,df['Shares']) 

ndf = df.pivot_table(columns='Symbol',values='Shares',index='Date')\
       .reindex(date_range).fillna(0).assign(CASH=np.nan)

基于给定数据的示例输出。

    
    Symbol    AAPL  GOOG     IBM  XOM  CASH
2011-01-10 -1500.0   0.0     0.0  0.0   NaN
2011-01-11     0.0   0.0     0.0  0.0   NaN
2011-01-12     0.0   0.0     0.0  0.0   NaN
2011-01-13  1500.0   0.0 -4000.0  0.0   NaN
2011-01-14     0.0   0.0     0.0  0.0   NaN
2011-01-15     0.0   0.0     0.0  0.0   NaN
2011-01-16     0.0   0.0     0.0  0.0   NaN
2011-01-17     0.0   0.0     0.0  0.0   NaN
2011-01-18     0.0   0.0     0.0  0.0   NaN
2011-01-19     0.0   0.0     0.0  0.0   NaN

如果SPY中出现SPY符号,则会自动添加缺失的OG_df列。