如果pandas dataframe df包含:
A B C D
a 1 2 3 4
b 2 NaN NaN 5
c NaN 7 NaN 2
d NaN 2 4 3
如何将第一行添加到所有其余行,只有它们包含数字的位置才能获得结果数据帧:
A B C D
b 3 NaN NaN 9
c NaN 9 NaN 6
d NaN 4 7 7
我计划这样做,然后创建行名称的字典,并将第一个表的每一行的列的乘积除以第二个表中的同一行,从而将值保存在字典中。我有工作代码执行此操作(下面),但我担心它不是“PANDAS”足够的,并且它对于我想要执行的简单任务而言过于复杂。我是否有最佳解决方案,或者我错过了一些明显的解决方案?
如果Pandas代码仍然需要遍历行,那么它不值得,但我觉得应该有一种方法来就地执行此操作。
代码:
import numpy as np
import pandas as pd
dindex = [1,2,3] #indices of drugs to select (set this)
def get_drugs(): #generates random "drug characteristics" as pandas df
cduct = ['dose','g1','g2','g3','g4','g5']
drg = ['d1','d2','d3','d4']
return pd.DataFrame(abs(np.random.randn(6,4)),index=cduct,columns=drg)
def sel_drugs(dframe, selct): #removes unwanted drugs from df.
#Pass df and dindex to this function
return dframe.iloc[:,selct].values, dframe[1:].index.tolist()
#returns a tuple of [values, names]
def cal_conduct(val, cnames): #calculates conductance scaling.
#Pass values and names to this function
cduct = {} #initialize dict
for ix, gname in enumerate(cnames):
_top = val[ix+1]; _bot = val[0]+val[ix+1]
cduct[gname] = (np.product(_top[np.isfinite(_top)])/
np.product(_bot[np.isfinite(_bot)]))
return cduct #return a dictionary of scaling factors
def main():
selection = sel_drugs(get_drugs(),dindex)
print cal_conduct(selection[0], selection[1])
main()
答案 0 :(得分:3)
Pandas会自动对齐/广播,所以这很简单
In [8]: df
Out[8]:
A B C D
a 1 2 3 4
b 2 NaN NaN 5
c NaN 7 NaN 2
d NaN 2 4 3
In [11]: df.iloc[1:] + df.iloc[0]
Out[11]:
A B C D
b 3 NaN NaN 9
c NaN 9 NaN 6
d NaN 4 7 7
如果我正确阅读,第二部分就是这个
In [12]: df2 = df.iloc[1:] + df.iloc[0]
In [13]: df.prod()
Out[13]:
A 2
B 28
C 12
D 120
dtype: float64
In [14]: df2/df.prod()
Out[14]:
A B C D
b 1.5 NaN NaN 0.075000
c NaN 0.321429 NaN 0.050000
d NaN 0.142857 0.583333 0.058333
答案 1 :(得分:0)
以下是一些基于@Jeff回答的代码。它的速度慢了大约40%,至少在测试数据很少的情况下,但它更简单。
import numpy as np
import pandas as pd
dindex = [1,2,3] #indices of drugs to select (set this)
def get_drugs(): #generates random "drug characteristics" as pandas df
cduct = ['dose','g1','g2','g3','g4','g5']
drg = ['d1','d2','d3','d4']
return pd.DataFrame(abs(np.random.randn(6,4)),index=cduct,columns=drg)
def cal_conduct(frame,selct): #calculates conductance scaling.
#Pass df with selections made
s = frame.iloc[:,selct]
cduct = s.iloc[1:].prod(1)/(s.iloc[0]+s.iloc[1:]).prod(1)
return cduct.to_dict() #return a dictionary of scaling factors
def main():
scaling = cal_conduct(get_drugs(), dindex)
print scaling
main()