熊猫应用多种自定义功能

时间:2017-03-06 07:46:37

标签: python pandas

我是python和pandas的新手。我需要对pandas数据帧进行一些简单的解析,以获得涉及多个函数的新数据帧。这是一个玩具示例:

df = pd.DataFrame({'A' : pd.Series(["T100", "T100", "M100", "M100"]), 'B' : pd.Series(["520", "620", "720", "820"]), 'C' : pd.Series(["10/50", "20/50", "30/50", "50/50"])})

>>> df
      A       B      C
0  T100     520  10/50
1  T100     620  20/50
2  M100     720  30/50
3  M100     820  50/50

这就是我所尝试过的(当然它没有用 - 它返回了错误AttributeError: 'DataFrame' object has no attribute 'agg',但我想要做的是在那里):

 def get_pat_ID(row):
      sample = row['A']
      patID = re.match("[TM](\d+)", sample).group(1)
      return(patID)

 def get_funcB(row):
      sample, b, c = row['A'], row['B'], row['C']
      if sample == "T100":
           output = b + "_" + c
      else:
           output = "NA"
      return(output)   

  def cust(dataset, funcname):
      f = dataset.apply(funcname, axis=1) # I want the function to be performed on each row of my dataframe
      return(f)

  funcdict = {"pat_ID": get_pat_ID, "funcB": get_funcB} # contains all the functions that I want to pass to my dataframe         
  funcs = {'PatID': cust(df, funcdict["pat_ID"]), 'AnotherFunc': cust(df, funcdict["funcB"])} # creates one column for output of each function
  newdf = pd.DataFrame()
  newdf = df.agg(funcs)

我知道我的方法不是最有效的,因为apply函数在每次计算函数时重复相同的行。任何人都可以帮我吗?

1 个答案:

答案 0 :(得分:0)

>>> ndf = df.copy()
>>> for k,v in funcdict.iteritems():
...     ndf[k] = ndf.apply(v, axis=1)
... 
>>> ndf
      A    B      C      funcB pat_ID
0  T100  520  10/50  520_10/50    100
1  T100  620  20/50  620_20/50    100
2  M100  720  30/50         NA    100
3  M100  820  50/50         NA    100

甚至是简单的循环:

public static data= {};