将数据列从一个SFrame附加到另一个SFrame

时间:2015-11-17 22:45:31

标签: python pandas dataframe graphlab sframe

我的培训数据train SFrame看起来像4列(&#34; Store&#34;列在此SFrame中非唯一):< / p>

+-------+------------+---------+-----------+
| Store |    Date    |  Sales  | Customers |
+-------+------------+---------+-----------+
|   1   | 2015-07-31 |  5263.0 |   555.0   |
|   2   | 2015-07-31 |  6064.0 |   625.0   |
|   3   | 2015-07-31 |  8314.0 |   821.0   |
|   4   | 2015-07-31 | 13995.0 |   1498.0  |
|   3   | 2015-07-20 |  4822.0 |   559.0   |
|   2   | 2015-07-10 |  5651.0 |   589.0   |
|   4   | 2015-07-11 | 15344.0 |   1414.0  |
|   5   | 2015-07-23 |  8492.0 |   833.0   |
|   2   | 2015-07-19 |  8565.0 |   687.0   |
|   10  | 2015-07-09 |  7185.0 |   681.0   |
+-------+------------+---------+-----------+
[986159 rows x 4 columns]

给定第二个store SFrame(&#34; Store&#34;列在此SFrame中是唯一的):

+-------+-----------+
| Store | StoreType |
+-------+-----------+
|   1   |     c     |
|   2   |     a     |
|   3   |     a     |
|   4   |     c     |
|   5   |     a     |
|   6   |     a     |
|   7   |     a     |
|   8   |     a     |
|   9   |     a     |
|   10  |     a     |
+-------+-----------+

我可以通过浏览StoreType中的每一行,将相应的train附加到我的SFrame train,然后从{{1}找到相应的StoreType然后保留列和ise store

SFrame.add_column()

获得:

store_type_col = []
for row in train:
    row_store = row['Store']
    row_storetype = next(i for i in store if i['Store'] == row_store)['StoreType']
    store_type_col.append(row_storetype)

train.add_column(graphlab.SArray(store_type_col, dtype=str), name='StoreType')

但我确信使用+-------+------------+---------+-----------+-----------+ | Store | Date | Sales | Customers | StoreType | +-------+------------+---------+-----------+-----------+ | 1 | 2015-07-31 | 5263.0 | 555.0 | c | 2 | 2015-07-31 | 6064.0 | 625.0 | a | 3 | 2015-07-31 | 8314.0 | 821.0 | a | 4 | 2015-07-31 | 13995.0 | 1498.0 | c | 3 | 2015-07-20 | 4822.0 | 559.0 | a | 2 | 2015-07-10 | 5651.0 | 589.0 | a | 4 | 2015-07-11 | 15344.0 | 1414.0 | c | 5 | 2015-07-23 | 8492.0 | 833.0 | a | 2 | 2015-07-19 | 8565.0 | 687.0 | a | 10 | 2015-07-09 | 7185.0 | 681.0 | a +-------+------------+---------+-----------+-----------+ [986159 rows x 5 columns] 可以更简单快捷地完成此操作。当前方法的最差情况是Graphlab,其中n = no。 O(n*m)中的行数和m =否。 train中的行数。

想象一下,我的m store有8列,我想将其附加到SFrame。上面的代码效率极低。

我还可以将数据列从一个SFrame附加到另一个SFrame吗?(也欢迎Pandas解决方案)

1 个答案:

答案 0 :(得分:1)

您可以使用join操作执行此操作。

out = train.join(store, on = 'Store')