使用多列的pandas apply()函数的问题

时间:2017-08-28 12:08:44

标签: python pandas dataframe

我在使用复杂函数来使用pandas数据帧而不是单个传递值时遇到了问题。

我已经编写并测试了以下功能:

def calcTrueWind(cog, sog, appWindDir, appWindSpd, heading):

dtor = np.math.pi / 180   # Degrees to radians conversion

# Convert appWindSpd from m/s to K
#appWindSpd = appWindSpd*1.94384

# Convert navigation coordinates to math angles
mathCourse = 90 - float(cog)

# Keep value between 0 and 360
if mathCourse <=0.0:
    mathCourse = mathCourse +360

# Calculate apparant wind direction
appWindDir = float(heading) + float(appWindDir)

# Keep value between 0 and 360
if appWindDir >= 360:
    appWindDir = appWindDir-360

# Convert metereological coordinates to math angles
mathDirection = 270 - appWindDir

# Ensure values are between 0 and 360
if mathDirection <= 0:
    mathDirection = mathDirection + 360
elif mathDirection > 360:
    mathDirection = mathDirection - 360

# Compute East-West vector
x = (float(appWindSpd) * np.math.cos(mathDirection * dtor)) + (float(sog) * np.math.cos(mathCourse * dtor))

# Compute North-South vector
y = (float(appWindSpd) * np.math.sin(mathDirection * dtor)) + (float(sog) * np.math.sin(mathCourse * dtor))

# Use the two vector components to calculate the true wind speed
trueWindSpeed = np.math.sqrt((x*x)+(y*y))

calm_flag = 1.0

# Determine true wind angle
if (abs(y) > 0.00001):
    mathDirection = (np.math.atan2(y,x))/dtor
else:
    if abs(y) > 0.00001:
        mathDirection = 180 - (90*y)/abs(y)
    else:
        mathDirection = 270.0
        calm_flag = 0.0

trueWindDirection = 270 - mathDirection

# 0 - 360 boundary check
if trueWindDirection < 0.0:
    trueWindDirection = (trueWindDirection + 360)*calm_flag

if trueWindDirection > 360:
    trueWindDirection = (trueWindDirection - 360)*calm_flag

# Round before returning values
trueWindSpeed = round(trueWindSpeed,1)
trueWindDirection = round(trueWindDirection,1)

return[trueWindSpeed, trueWindDirection]

我通过传递样本值来测试函数,如下所示:

tws, twd = calcTrueWind( 247.3, 10.5 , 110.3, 21.6, 244.2)

print "trueWindSpeed: " + str(tws)
print "trueWindDirection: " + str(twd)

我现在正尝试将此功能应用于pandas数据帧。

数据框的示例显示如下:

    date_time_stamp  | fld_courseOverGround | fld_speedOverGround | fld_appWindDirection | fld_appWindSpeed  | fld_heading | fld_trueWindSpeed | fld_trueWindDirection
-----------------------+----------------------+---------------------+----------------------+-------------------+-------------+-------------------+----------------------
0 |2017-04-05 07:35:09 |               308.05 |                0.00 | 358                  | 1.9               |315.5        |           |
1 |2017-04-05 07:35:12 |               333.06 |                0.00 | 359                  | 1.9               |315.4        |           |
2 |2017-04-05 07:35:17 |               254.68 |                0.01 | 000                  | 1.8               |315.4        |           |

应将前5列传递给函数,并使用apply函数计算数据帧的最后两列。

这是我尝试过的:

df_truewindtmp['fld_trueWindSpeed'], df_truewindtmp['fld_trueWindSpeed'] = df_truewindtmp.apply(
    lambda row: calcTrueWind(row['fld_courseOverGround'], 
                             row['fld_speedOverGround'], 
                             row['fld_appWindDirection'], 
                             row['fld_appWindSpeed'], 
                             row['fld_heading']
                            ),    axis=1)

导致以下错误: ValueError:传递值的形状为(10,2),指数暗示(10,8)

任何指针都会非常感激。

1 个答案:

答案 0 :(得分:1)

我认为你可以改变:

return[trueWindSpeed, trueWindDirection]

为:

return pd.Series([trueWindSpeed, trueWindDirection])

然后:

df_truewindtmp[['fld_trueWindSpeed','fld_trueWindDirection']] = df_truewindtmp.apply(
lambda row: calcTrueWind(row['fld_courseOverGround'], 
                         row['fld_speedOverGround'], 
                         row['fld_appWindDirection'], 
                         row['fld_appWindSpeed'], 
                         row['fld_heading']
                        ),    axis=1)

样品:

df_truewindtmp = pd.DataFrame({'A':list('abcdef'),
                   'B':[4,5,4,5,5,4],
                   'C':[7,8,9,4,2,3],
                   'D':[1,3,5,7,1,0],
                   'E':[5,3,6,9,2,4],
                   'F':list('aaabbb')})

print (df_truewindtmp)
   A  B  C  D  E  F
0  a  4  7  1  5  a
1  b  5  8  3  3  a
2  c  4  9  5  6  a
3  d  5  4  7  9  b
4  e  5  2  1  2  b
5  f  4  3  0  4  b

#sample function
def calcTrueWind(a,b,c):
    trueWindSpeed = a + b
    trueWindDirection = c - b
    return pd.Series([trueWindSpeed, trueWindDirection])
df_truewindtmp[['G','H']] = df_truewindtmp.apply(
lambda row: calcTrueWind(row['B'], 
                         row['C'], 
                         row['E']
                        ),    axis=1)   

print (df_truewindtmp)
   A  B  C  D  E  F   G  H
0  a  4  7  1  5  a  11 -2
1  b  5  8  3  3  a  13 -5
2  c  4  9  5  6  a  13 -3
3  d  5  4  7  9  b   9  5
4  e  5  2  1  2  b   7  0
5  f  4  3  0  4  b   7  1