用另外两列的数组创建新列,并在python pandas中测试它

时间:2014-03-05 00:47:27

标签: python pandas

我尝试创建一个列的坐标为numpy数组的列。我有Easting和Northing的数据。我想简单地通过降低它来减少大数量。我尝试用Unittest测试它

我尝试使用.apply(lambda)关注其他questions,但可以解决我的错误。 (我在pandas 0.9工作,无法更新)。下面是一个示例代码,我努力的功能是adjustCoordSystem()

import unittest
import pandas as pd
from pandas.util.testing import assert_frame_equal

def exampleDf():
    df = pd.DataFrame({'Easting':{0:11,1:12,2:13,3:14},
                  'Northing':{0:5,1:7,2:9,3:11}})
    return df

def exampWithCoord():
    df = exampleDf()
    df['Sample']=[[0,0,0],[1,2,0],[2,4,0],[3,6,0]]
    return df

class dfProccesedFull():

    def adjustCoordSystem(self, df):
        ''' change coordinate system to get from 0 to max'''
        df['Sample'] = \
        [df['Easting'].apply(lambda x: x - min(df['Easting'])),
         df['Northing'].apply(lambda x: x - min(df['Northing'])),
         df['Northing'].apply(lambda x: 0.0)]

#         [(df['Easting'] - min(df['Easting'])), (df['Northing'] - min(df['Northing'])),\
#          df['Northing'].apply(lambda x: 0.0)]

        return df

class TestDfProccesedDataFull(unittest.TestCase):

    def test_adjustCoordSystem(self):
        df = exampleDf()
        dfModel = exampWithCoord()
        tData =  dfProccesedFull()
        dfTested=tData.adjustCoordSystem(df)
        assert_frame_equal(dfTested, dfModel)

if __name__ == "__main__"
    unittest.main()

我遇到错误:AssertionError代码为df['Northing'].apply(lambda x: 0.0)]

我应该如何更改我的功能,使“Sample”列中的数组列出但不遍历每一行?

我正在寻找的输出是新的数据帧,例如:

   Easting  Northing     Sample
0       11         5  [0, 0, 0]
1       12         7  [1, 2, 0]
2       13         9  [2, 4, 0]
3       14        11  [3, 6, 0]

其中“Sample”列为[来自Easting的x坐标,来自Northing的y坐标,z坐标= 0]

1 个答案:

答案 0 :(得分:2)

我不确定这一点是什么意思......你正在尝试将它分配给一个列,除非df为长度为3,否则它将失败:

df['Sample'] = [df['Easting'].apply(lambda x: x - min(df['Easting'])),
                df['Northing'].apply(lambda x: x - min(df['Northing'])),
                df['Northing'].apply(lambda x: 0.0)]

参见例如:

In [21]: df = pd.DataFrame([[1, 2], [3, 4]], columns=['A', 'B'])

In [22]: df['C'] = [df.copy(), df.copy()]  # use copy to avoid max recursion error...

In [23]: df['C'] = [1, 2, 3]
ValueError: Length of values does not match length of index