Pandas通过数据帧迭代函数

时间:2017-12-16 19:47:26

标签: python python-3.x

我对Python和功能相对较新。我尝试通过数据帧的每一行迭代以下函数,并将每行的计算结果附加到新列:

def manhattan_distance(x,y):

  return sum(abs(a-b) for a,b in zip(x,y))

作为参考,这是我正在测试的数据框:

entries = [
{'age1':'2', 'age2':'2'},
{'age1':'12', 'age2': '12'},
{'age1':'5', 'age2': '50'}
]

df=pd.DataFrame(entries)

df['age1'] = df['age1'].astype(str).astype(int)
df['age2'] = df['age2'].astype(str).astype(int)

我已经看到了这个答案How to iterate over rows in a DataFrame in Pandas?并且已经达到了这个目的:

import itertools
for index, row in df.iterrows():

    df['distance']=df.apply(lambda row: manhattan_distance(row['age1'], row['age2']), axis=1)

返回以下内容:

-----------------------------------------------------------------------      ----
TypeError                                 Traceback (most recent call  last)
<ipython-input-42-aa6a21cd1de9> in <module>()
      4 #    print (manhattan_distance(row['age1'],row['age2']))
      5 
----> 6     df['distance']=df.apply(lambda row:    manhattan_distance(row['age1'], row['age2']), axis=1)

/usr/local/lib/python3.5/dist-packages/pandas/core/frame.py in   apply(self, func, axis, broadcast, raw, reduce, args, **kwds)
   4852                         f, axis,
   4853                         reduce=reduce,
-> 4854                         ignore_failures=ignore_failures)
   4855             else:
   4856                 return self._apply_broadcast(f, axis)

/usr/local/lib/python3.5/dist-packages/pandas/core/frame.py in _apply_standard(self, func, axis, ignore_failures, reduce)
   4948             try:
   4949                 for i, v in enumerate(series_gen):
-> 4950                     results[i] = func(v)
   4951                     keys.append(v.name)
   4952             except Exception as e:

<ipython-input-42-aa6a21cd1de9> in <lambda>(row)
      4 #    print (manhattan_distance(row['age1'],row['age2']))
      5 
----> 6     df['distance']=df.apply(lambda row:     manhattan_distance(row['age1'], row['age2']), axis=1)

<ipython-input-36-74da75398c4c> in manhattan_distance(x, y)
      1 def manhattan_distance(x,y):
      2 
----> 3   return sum(abs(a-b) for a,b in zip(x,y))
      4  #   return sum(abs(a-b) for a,b in map(lambda x: zip(a,b)))

TypeError: ('zip argument #1 must support iteration', 'occurred at index 0')

根据我上面提到的问题的其他回答,我试图在我的函数中修改zip语句:

import itertools
for index, row in df.iterrows():

    df['distance']=df.apply(lambda row: manhattan_distance(row['age1'], row['age2']), axis=1)

以上回复:

--------------------------------------------------------------------------
TypeError                                 Traceback (most recent call  last)
<ipython-input-44-aa6a21cd1de9> in <module>()
      4 #    print (manhattan_distance(row['age1'],row['age2']))
      5 
----> 6     df['distance']=df.apply(lambda row:   manhattan_distance(row['age1'], row['age2']), axis=1)

/usr/local/lib/python3.5/dist-packages/pandas/core/frame.py in apply(self, func, axis, broadcast, raw, reduce, args, **kwds)
   4852                         f, axis,
   4853                         reduce=reduce,
-> 4854                         ignore_failures=ignore_failures)
   4855             else:
   4856                 return self._apply_broadcast(f, axis)

/usr/local/lib/python3.5/dist-packages/pandas/core/frame.py in _apply_standard(self, func, axis, ignore_failures, reduce)
   4948             try:
   4949                 for i, v in enumerate(series_gen):
-> 4950                     results[i] = func(v)
   4951                     keys.append(v.name)
   4952             except Exception as e:

<ipython-input-44-aa6a21cd1de9> in <lambda>(row)
      4 #    print (manhattan_distance(row['age1'],row['age2']))
      5 
----> 6     df['distance']=df.apply(lambda row:  manhattan_distance(row['age1'], row['age2']), axis=1)

<ipython-input-43-5daf167baf5f> in manhattan_distance(x, y)
      2 
      3 #  return sum(abs(a-b) for a,b in zip(x,y))
----> 4    return sum(abs(a-b) for a,b in map(lambda x: zip(a,b)))

TypeError: ('map() must have at least two arguments.', 'occurred at index 0')

如果这是正确的方法,我不清楚我的map()参数需要什么来使函数起作用。

1 个答案:

答案 0 :(得分:0)

import numpy as np
import pandas as pd

entries = [
{'age1':'2', 'age2':'2'},
{'age1':'12', 'age2': '12'},
{'age1':'5', 'age2': '50'}
]

df = pd.DataFrame(entries)
df['age1'] = df['age1'].astype(str).astype(int)
df['age2'] = df['age2'].astype(str).astype(int)

def manhattan_distance(row):
    # https://en.wikipedia.org/wiki/Taxicab_geometry#Formal_definition
    return np.sum(abs(row['age1']-row['age2']))

df['distance'] = df.apply(manhattan_distance, axis=1)
print(df)
相关问题