我是python和pandas的新手。我需要对pandas数据帧进行一些简单的解析,以获得涉及多个函数的新数据帧。这是一个玩具示例:
df = pd.DataFrame({'A' : pd.Series(["T100", "T100", "M100", "M100"]), 'B' : pd.Series(["520", "620", "720", "820"]), 'C' : pd.Series(["10/50", "20/50", "30/50", "50/50"])})
>>> df
A B C
0 T100 520 10/50
1 T100 620 20/50
2 M100 720 30/50
3 M100 820 50/50
这就是我所尝试过的(当然它没有用 - 它返回了错误AttributeError: 'DataFrame' object has no attribute 'agg'
,但我想要做的是在那里):
def get_pat_ID(row):
sample = row['A']
patID = re.match("[TM](\d+)", sample).group(1)
return(patID)
def get_funcB(row):
sample, b, c = row['A'], row['B'], row['C']
if sample == "T100":
output = b + "_" + c
else:
output = "NA"
return(output)
def cust(dataset, funcname):
f = dataset.apply(funcname, axis=1) # I want the function to be performed on each row of my dataframe
return(f)
funcdict = {"pat_ID": get_pat_ID, "funcB": get_funcB} # contains all the functions that I want to pass to my dataframe
funcs = {'PatID': cust(df, funcdict["pat_ID"]), 'AnotherFunc': cust(df, funcdict["funcB"])} # creates one column for output of each function
newdf = pd.DataFrame()
newdf = df.agg(funcs)
我知道我的方法不是最有效的,因为apply
函数在每次计算函数时重复相同的行。任何人都可以帮我吗?
答案 0 :(得分:0)
>>> ndf = df.copy()
>>> for k,v in funcdict.iteritems():
... ndf[k] = ndf.apply(v, axis=1)
...
>>> ndf
A B C funcB pat_ID
0 T100 520 10/50 520_10/50 100
1 T100 620 20/50 620_20/50 100
2 M100 720 30/50 NA 100
3 M100 820 50/50 NA 100
甚至是简单的循环:
public static data= {};