我有一个dataframe
,当它最初加载一个列表列表时,如下所示:
0 1 2 3 4 5 6 7 8 \
0 Segment Nov-12 Dec-12 Jan-13 Feb-13 Mar-13 Apr-13 May-13
1 A N/A N/A N/A N/A N/A
2 B N/A N/A N/A N/A N/A
3 C N/A N/A N/A N/A N/A
4 D N/A N/A N/A N/A N/A
5 Total N/A N/A N/A N/A N/A
每个月下的值将为浮动值。我想转动dataframe
所以我最终得到的结果是:
Segment Month Value
0 A month value
1 A month value
2 B month value
3 B month value
etc...
最好的方法是什么?
答案 0 :(得分:2)
v = df.values[1:, 1:].astype(float)
mux = pd.MultiIndex.from_product(
[df.iloc[1:, 0], df.iloc[0, 1:]],
names=['Segment', 'Month']
)
d1 = pd.Series(v.ravel(), mux).reset_index(name='Value')
print(d1)
Segment Month Value
0 A Nov-12 NaN
1 A Dec-12 NaN
2 A Jan-13 NaN
3 A Feb-13 NaN
4 A Mar-13 NaN
5 A Apr-13 NaN
6 A May-13 NaN
7 B Nov-12 NaN
8 B Dec-12 NaN
9 B Jan-13 NaN
10 B Feb-13 NaN
11 B Mar-13 NaN
12 B Apr-13 NaN
13 B May-13 NaN
14 C Nov-12 NaN
15 C Dec-12 NaN
16 C Jan-13 NaN
17 C Feb-13 NaN
18 C Mar-13 NaN
19 C Apr-13 NaN
20 C May-13 NaN
21 D Nov-12 NaN
22 D Dec-12 NaN
23 D Jan-13 NaN
24 D Feb-13 NaN
25 D Mar-13 NaN
26 D Apr-13 NaN
27 D May-13 NaN
28 Total Nov-12 NaN
29 Total Dec-12 NaN
30 Total Jan-13 NaN
31 Total Feb-13 NaN
32 Total Mar-13 NaN
33 Total Apr-13 NaN
34 Total May-13 NaN
解释
# Your data obviously has an index in the first column
# and column headers in the first row
# I grab the underlyting `numpy` array
# from the 2nd column and 2nd row onward
# and convert to float
v = df.values[1:, 1:].astype(float)
# I'm going to create a `pd.MultiIndex` to enable me
# to unstack the `pd.Series` I'll create
# the first level of the index will be that first column
# that was obviously the index
# the second level will be the first row that was
# obviously the column headers
# the trick here is that I use `from_product`
# which gives me every combination of those arrays
# `ravel` unwinds or flattens the matrix and now
# lines up with this `pd.MultiIndex` that has every combination
# of row and column labels
mux = pd.MultiIndex.from_product(
[df.iloc[1:, 0], df.iloc[0, 1:]],
names=['Segment', 'Month']
)
# I construct the `pd.Series` and `unstack` to make the matrix
# `reset_index` takes those levels of the index and pushes them out
# the the dataframe data part. `name='Value'` just makes sure the
# values of the series get a column name
d1 = pd.Series(v.ravel(), mux).reset_index(name='Value')
print(d1)
答案 1 :(得分:0)
我最终找到了解决方案,但请让我知道如何改进它。
cac_df = pd.DataFrame(data=vals)
cac_df.rename(index=cac_df[0], inplace=True)
del cac_df[0]
cac_df = cac_df.rename(columns=cac_df.loc['Segment']).drop('Segment')
cac_df = cac_df.applymap(lambda x: None if not x or x == 'N/A' else x)
cac_df = pd.DataFrame(
cac_df.dropna(axis=1, how='all').stack()
)
堆栈引发了我一个循环,因为它返回了Series
而不是DataFrame
,如果您只有一个级别的列层次结构,则会在文档中注明。