我有一个像这样结构的pandas DataFrame有3个索引级别:
a b
0hr 0.01um 0 12 42
1 10 35
0.1um 0 8 28
1 6 21
Control 0 4 14
1 2 7
24hr 0.01um 0 18 30
1 15 25
0.1um 0 12 20
1 9 15
Control 0 6 10
1 3 5
DataFrame是从一系列excel文件导入的。抱歉,我无法提供一段代码来生成这个3级深度索引结构,因为我不知道如何直接生成它。
我正在寻找通过各自" Control"来划分每个值的语法。
例如
a b
0hr 0.01um 0 =12/4 =42/14
1 =10/2 =35/7
0.1um 0 =8/4 =28/14
1 =6/2 =21/7
Control 0 =4/4 =14/14
1 =2/2 =7/7
24hr 0.01um 0 =18/6 =30/10
1 =15/3 =25/5
0.1um 0 =12/6 =20/10
1 =9/3 =15/5
Control 0 =6/6 =10/10
1 =3/3 =5/5
将生成具有以下值的数据框:
a b
0hr 0.01um 0 3 3
1 5 5
0.1um 0 2 2
1 3 3
Control 0 1 1
1 1 1
24hr 0.01um 0 3 3
1 5 5
0.1um 0 2 2
1 3 3
Control 0 1 1
1 1 1
我尝试用循环执行此操作,但我认为DataFrame.div方法可能有更好的语法,但我无法弄明白。任何帮助将不胜感激。
答案 0 :(得分:2)
人们希望能够只定义控件并使用它来划分数据库,但不幸的是,这不能按预期工作。它只划分索引排队的位置(在'Control'上),将NaN留在其他索引级别上。
# Initialize DataFrame
df = pd.DataFrame({'a': {('0hr', '0.01um', 0): 12,
('0hr', '0.01um', 1): 10,
('0hr', '0.1um', 0): 8,
('0hr', '0.1um', 1): 6,
('0hr', 'Control', 0): 4,
('0hr', 'Control', 1): 2,
('24hr', '0.01um', 0): 18,
('24hr', '0.01um', 1): 15,
('24hr', '0.1um', 0): 12,
('24hr', '0.1um', 1): 9,
('24hr', 'Control', 0): 6,
('24hr', 'Control', 1): 3},
'b': {('0hr', '0.01um', 0): 42,
('0hr', '0.01um', 1): 35,
('0hr', '0.1um', 0): 28,
('0hr', '0.1um', 1): 21,
('0hr', 'Control', 0): 14,
('0hr', 'Control', 1): 7,
('24hr', '0.01um', 0): 30,
('24hr', '0.01um', 1): 25,
('24hr', '0.1um', 0): 20,
('24hr', '0.1um', 1): 15,
('24hr', 'Control', 0): 10,
('24hr', 'Control', 1): 5}})
control = df.xs('Control', level=1)
>>> control
a b
0hr Control 0 4 14
1 2 7
24hr Control 0 6 10
1 3 5
>>> df.divide(control)
a b
0hr 0.01um 0 NaN NaN
1 NaN NaN
0.1um 0 NaN NaN
1 NaN NaN
Control 0 1 1
1 1 1
24hr 0.01um 0 NaN NaN
1 NaN NaN
0.1um 0 NaN NaN
1 NaN NaN
Control 0 1 1
1 1 1
或者,可以尝试在进行除法时指定级别。但是,这种方法的问题在于此操作会引发错误,因为两个操作数仍然是MultiIndex对象。抛出错误是因为如果级别可能以多种方式匹配,则可能存在歧义。
>>> df.divide(control, level=1)
TypeError: Join on level between two MultiIndex objects is ambiguous
诀窍是重塑您的DataFrame以避免这种歧义。
# Reshape DataFrame.
df2 = df.T.stack(level=[0, 1])
>>> df2
0.01um 0.1um Control
a 0hr 0 12 8 4
1 10 6 2
24hr 0 18 12 6
1 15 9 3
b 0hr 0 42 28 14
1 35 21 7
24hr 0 30 20 10
1 25 15 5
# Divide reshaped DataFrame by 'Control' on the appropriate axis.
df3 = df2.divide(df2.Control, axis=0)
>>> df3
0.01um 0.1um Control
a 0hr 0 3 2 1
1 5 3 1
24hr 0 3 2 1
1 5 3 1
b 0hr 0 3 2 1
1 5 3 1
24hr 0 3 2 1
1 5 3 1
然后,您需要将DataFrame重新整形为原始格式。
# Shape DataFrame back to original order.
result = df3.T.unstack().reorder_levels([1, 3, 2, 0]).unstack()
>>> result
a b
0hr 0.01um 0 3 3
1 5 5
0.1um 0 2 2
1 3 3
Control 0 1 1
1 1 1
24hr 0.01um 0 3 3
1 5 5
0.1um 0 2 2
1 3 3
Control 0 1 1
1 1 1
答案 1 :(得分:1)
好的,这就是我得到的。比我更喜欢的步骤,但它的工作原理。希望有人想出更好的东西
从你的框架开始
a b
0hr 0.01um 0 12 42
1 10 35
0.1um 0 8 28
1 6 21
Control 0 4 14
1 2 7
24hr 0.01um 0 18 30
1 15 25
0.1um 0 12 20
1 9 15
Control 0 6 10
1 3 5
首先我们重置索引。请注意前一个索引的列名。你可能会有所不同。
frame.reset_index(inplace=True)
frame
level_0 level_1 level_2 a b
0 0hr 0.01um 0 12 42
1 0hr 0.01um 1 10 35
2 0hr 0.1um 0 8 28
3 0hr 0.1um 1 6 21
4 0hr Control 0 4 14
5 0hr Control 1 2 7
6 24hr 0.01um 0 18 30
7 24hr 0.01um 1 15 25
8 24hr 0.1um 0 12 20
9 24hr 0.1um 1 9 15
10 24hr Control 0 6 10
11 24hr Control 1 3 5
接下来,我们使用布尔索引过滤标记为Control的所有内容。然后,我们merge使用我们的原始版本“过滤”版本。
filter = frame["level_1"] == "Control"
frame = pd.merge(frame,frame[filter],on=["level_0","level_2"],suffixes=["","_control"])
frame
level_0 level_1 level_2 a b level_1_control a_control b_control
0 0hr 0.01um 0 12 42 Control 4 14
1 0hr 0.1um 0 8 28 Control 4 14
2 0hr Control 0 4 14 Control 4 14
3 0hr 0.01um 1 10 35 Control 2 7
4 0hr 0.1um 1 6 21 Control 2 7
5 0hr Control 1 2 7 Control 2 7
6 24hr 0.01um 0 18 30 Control 6 10
7 24hr 0.1um 0 12 20 Control 6 10
8 24hr Control 0 6 10 Control 6 10
9 24hr 0.01um 1 15 25 Control 3 5
10 24hr 0.1um 1 9 15 Control 3 5
11 24hr Control 1 3 5 Control 3 5
现在这个师......最后......在最后一行继续进行。减小数据框的大小,排序并重新应用索引以匹配原始框架
frame["a"] = frame["a"] / frame["a_control"]
frame["b"] = frame["b"] / frame["b_control"]
frame = frame[["level_0","level_1","level_2","a","b"]].sort(["level_0","level_1","level_2"]).set_index(["level_0","level_1","level_2"])
frame
a b
level_0 level_1 level_2
0hr 0.01um 0 3 3
1 5 5
0.1um 0 2 2
1 3 3
Control 0 1 1
1 1 1
24hr 0.01um 0 3 3
1 5 5
0.1um 0 2 2
1 3 3
Control 0 1 1
1 1 1