我通过谷歌搜索找到答案,但没有运气。我需要重塑一个pandas数据帧,使数字非数字值(comp_url)成为多索引数据帧中的“值”。以下是数据样本:
store_name sku comp price ship comp_url
CSE A1025 compA 30.99 9.99 some url
CSE A1025 compB 30.99 9.99 some url
CSE A1025 compC 30.99 9.99 some url
我有几个store_name,所以我需要看起来像这样:
SKU CSE store_name2
comp_url price ship comp_url price ship
A1025 some url 30.99 9.99 some url 30.99 9.99
任何想法或指导都将不胜感激!
答案 0 :(得分:0)
也许pandas.Panel更合适。它们用于三维数据。 DataFrames是2d
答案 1 :(得分:0)
假设每个SKU / store_name组合都是唯一的,这是一个工作示例:
# imports
import pandas as pd
# Create a sample DataFrame.
cols = ['store_name', 'sku', 'comp', 'price', 'ship', 'comp_url']
records = [['CSA', 'A1025', 'compA', 30.99, 9.99, 'some url'],
['CSB', 'A1025', 'compB', 32.99, 9.99, 'some url2'],
['CSA', 'A1026', 'compC', 30.99, 19.99, 'some url'],
['CSB', 'A1026', 'compD', 30.99, 9.99, 'some url3']]
df = pd.DataFrame.from_records(records, columns=cols)
# Move both 'sku' and 'store_name' to the rows index; the combination
# of these two columns provide a unique identifier for each row.
df.set_index(['sku', 'store_name'], inplace=True)
# Move 'store_name' from the row index to the column index. Each
# unique value in the 'store_name' index gets its own set of columns.
# In the multiindex, 'store_name' will be below the existing column
# labels.
df = df.unstack(1)
# To get the 'store_name' above the other column labels, we simply
# reorder the levels in the MultiIndex and sort it.
df.columns = df.columns.reorder_levels([1, 0])
df.sort_index(axis=1, inplace=True)
# Show the result.
df
这是有效的,因为sku / store_name标签组合是唯一的。当我们使用unstack()
时,我们只是移动标签和单元格。我们没有做任何聚合。如果我们做的事情没有唯一标签并且需要汇总,那么pivot_table()
可能是更好的选择。