我对Python和功能相对较新。我尝试通过数据帧的每一行迭代以下函数,并将每行的计算结果附加到新列:
def manhattan_distance(x,y):
return sum(abs(a-b) for a,b in zip(x,y))
作为参考,这是我正在测试的数据框:
entries = [
{'age1':'2', 'age2':'2'},
{'age1':'12', 'age2': '12'},
{'age1':'5', 'age2': '50'}
]
df=pd.DataFrame(entries)
df['age1'] = df['age1'].astype(str).astype(int)
df['age2'] = df['age2'].astype(str).astype(int)
我已经看到了这个答案How to iterate over rows in a DataFrame in Pandas?并且已经达到了这个目的:
import itertools
for index, row in df.iterrows():
df['distance']=df.apply(lambda row: manhattan_distance(row['age1'], row['age2']), axis=1)
返回以下内容:
----------------------------------------------------------------------- ----
TypeError Traceback (most recent call last)
<ipython-input-42-aa6a21cd1de9> in <module>()
4 # print (manhattan_distance(row['age1'],row['age2']))
5
----> 6 df['distance']=df.apply(lambda row: manhattan_distance(row['age1'], row['age2']), axis=1)
/usr/local/lib/python3.5/dist-packages/pandas/core/frame.py in apply(self, func, axis, broadcast, raw, reduce, args, **kwds)
4852 f, axis,
4853 reduce=reduce,
-> 4854 ignore_failures=ignore_failures)
4855 else:
4856 return self._apply_broadcast(f, axis)
/usr/local/lib/python3.5/dist-packages/pandas/core/frame.py in _apply_standard(self, func, axis, ignore_failures, reduce)
4948 try:
4949 for i, v in enumerate(series_gen):
-> 4950 results[i] = func(v)
4951 keys.append(v.name)
4952 except Exception as e:
<ipython-input-42-aa6a21cd1de9> in <lambda>(row)
4 # print (manhattan_distance(row['age1'],row['age2']))
5
----> 6 df['distance']=df.apply(lambda row: manhattan_distance(row['age1'], row['age2']), axis=1)
<ipython-input-36-74da75398c4c> in manhattan_distance(x, y)
1 def manhattan_distance(x,y):
2
----> 3 return sum(abs(a-b) for a,b in zip(x,y))
4 # return sum(abs(a-b) for a,b in map(lambda x: zip(a,b)))
TypeError: ('zip argument #1 must support iteration', 'occurred at index 0')
根据我上面提到的问题的其他回答,我试图在我的函数中修改zip语句:
import itertools
for index, row in df.iterrows():
df['distance']=df.apply(lambda row: manhattan_distance(row['age1'], row['age2']), axis=1)
以上回复:
--------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-44-aa6a21cd1de9> in <module>()
4 # print (manhattan_distance(row['age1'],row['age2']))
5
----> 6 df['distance']=df.apply(lambda row: manhattan_distance(row['age1'], row['age2']), axis=1)
/usr/local/lib/python3.5/dist-packages/pandas/core/frame.py in apply(self, func, axis, broadcast, raw, reduce, args, **kwds)
4852 f, axis,
4853 reduce=reduce,
-> 4854 ignore_failures=ignore_failures)
4855 else:
4856 return self._apply_broadcast(f, axis)
/usr/local/lib/python3.5/dist-packages/pandas/core/frame.py in _apply_standard(self, func, axis, ignore_failures, reduce)
4948 try:
4949 for i, v in enumerate(series_gen):
-> 4950 results[i] = func(v)
4951 keys.append(v.name)
4952 except Exception as e:
<ipython-input-44-aa6a21cd1de9> in <lambda>(row)
4 # print (manhattan_distance(row['age1'],row['age2']))
5
----> 6 df['distance']=df.apply(lambda row: manhattan_distance(row['age1'], row['age2']), axis=1)
<ipython-input-43-5daf167baf5f> in manhattan_distance(x, y)
2
3 # return sum(abs(a-b) for a,b in zip(x,y))
----> 4 return sum(abs(a-b) for a,b in map(lambda x: zip(a,b)))
TypeError: ('map() must have at least two arguments.', 'occurred at index 0')
如果这是正确的方法,我不清楚我的map()参数需要什么来使函数起作用。
答案 0 :(得分:0)
import numpy as np
import pandas as pd
entries = [
{'age1':'2', 'age2':'2'},
{'age1':'12', 'age2': '12'},
{'age1':'5', 'age2': '50'}
]
df = pd.DataFrame(entries)
df['age1'] = df['age1'].astype(str).astype(int)
df['age2'] = df['age2'].astype(str).astype(int)
def manhattan_distance(row):
# https://en.wikipedia.org/wiki/Taxicab_geometry#Formal_definition
return np.sum(abs(row['age1']-row['age2']))
df['distance'] = df.apply(manhattan_distance, axis=1)
print(df)