在pandas中的两列上使用if语句

时间:2017-08-31 07:18:03

标签: python python-2.7 pandas if-statement dataframe

我目前正在尝试计算具有多条线的地震导航文件中的炮点之间的距离。我目前的代码如下:

def Delimiter(Filename, a, b, c, d, e, f):
    data = pd.read_fwf(Filename, names=[a, b ,c ,d ,e ,f ], header=None)
    data['lineshift'] = data['line'].shift(-1)
    data['bool'] = data['lineshift'] == data['line']
    for _, row in data.iterrows():
        data['SPDIF'] = np.abs(data['sp'].astype(float) - data['sp'].astype(float).shift(-1))
        data['XDIFF'] = data['X'] - data['X'].shift(-1)
        data['YDIFF'] = data['Y'] - data['Y'].shift(-1)
        data['XYDIFF'] = np.sqrt(data['XDIFF']**2 + data['YDIFF']**2)
        data['SPDIST'] = data['XYDIFF']/data['SPDIF']
        if row['line'] != row['lineshift']:

            data['SPDIF'] = data['SPDIF'].replace({0: np.nan})
            data['XDIFF'] = data['XDIFF'].replace({0: np.nan})
            data['YDIFF'] = data['YDIFF'].replace({0: np.nan})
            data['XYDIFF'] = data['XYDIFF'].replace({0: np.nan})
            data['SPDIST'] = data['SPDIST'].replace({0: np.nan})
    data.info()
    print data

Delimiter(os.path.splitext(x)[0] + ".csv", "line", "sp", "Xcoord", "Ycoord", "X", "Y")

此代码将带有炮点数据的CSV加载到pandas数据帧中。但是,我想检查代码是否没有计算不同行的2个镜头点之间的距离。如果'行'列不同于' lineshift'同一行的列,我希望它显示N / A.如果它相同则应为该特定行计算5个新列。

但是,当我运行此代码时,它会出现以下错误:

  

ValueError:系列的真值是不明确的。使用a.empty,a.bool(),a.item(),a.any()或a.all()。

如果可能的话,我需要添加什么才能运行此代码并检查每一行?

CSV文件中的数据示例:

      line    sp    ycoord     xcoord    x       y          lineshift
8     761298  1080  521754.1N  65132.6E  255355  479838     761298   True
9     761298  1090  5218 2.5N  65154.3E  255760  480107     761298   True
10    761298  1100  521812.1N  65216.0E  256165  480410     761298   True
11    761298  1110  521820.7N  65236.8E  256554  480685     771022  False
12    771022  1020  521835.8N  65238.3E  256573  481153     771022   True
13    771022  1030  521841.0N  65245.2E  256700  481315     771022   True
14    771022  1040  521845.8N  65252.2E  256830  481466     771022   True

1 个答案:

答案 0 :(得分:0)

这个:data['lineshift'] == data['line']是一个系列,而不是布尔值,所以if data['lineshift'] == data['line']是不明确的。

我认为你的意思是测试循环中的当前行,例如:

    for _, row in data.iterrows():
        if row['lineshift'] == row['line']:
            # ...

修改:这会修复您报告的错误,但您不应在此处使用循环。

def Delimiter(Filename, a, b, c, d, e, f):
    data = pd.read_fwf(Filename, names=[a, b ,c ,d ,e ,f ], header=None)
    data['lineshift'] = data['line'].shift(-1)
    data['bool'] = data['lineshift'] == data['line']
    # calculate this only once
    data['SPDIF'] = np.abs(data['sp'].astype(float) - data['sp'].astype(float).shift(-1))
    data['XDIFF'] = data['X'] - data['X'].shift(-1)
    data['YDIFF'] = data['Y'] - data['Y'].shift(-1)
    data['XYDIFF'] = np.sqrt(data['XDIFF']**2 + data['YDIFF']**2)
    data['SPDIST'] = data['XYDIFF'] / data['SPDIF']

    data.loc[~data['bool'], ['SPDIF', 'XDIFF', 'YDIFF', 'XYDIFF', 'SPDIST']] = np.nan

    data.info()
    print data