Python Pandas if / else语句

时间:2018-08-21 14:41:16

标签: python pandas if-statement

我正在尝试使用熊猫编写嵌套的 @Test public void testExecuteSSHCommand() throws JSchException, IOException { Channel channel = mock(Channel.class); ChannelExec channelExec = mock(ChannelExec.class); String command = "dummyCommand"; String result = "the correct result"; InputStream inputStream = new ByteArrayInputStream(result.getBytes(StandardCharsets.UTF_8));; when(channel.getInputStream()).thenReturn(inputStream); when(session.openChannel("exec")).thenReturn(channel); //when(channel.setCommand(command)).get(); logger.info("Returning {}", sshClient.executeSSHCommand(session, command)); assertEquals(result, sshClient.executeSSHCommand(session, command)); } 语句,但是在熊猫中使用if语句不是很好。请找到正在处理的示例CSV数据以及我到目前为止编写的示例代码段。

if/else

df

当前的if / else语句逻辑:

t1  
8
1134
0
119
122
446
21
0
138 
0

此代码段引发import pandas as pd df = pd.read_csv('file.csv', sep=';') def get_cost(df): t_zone = 720 max_rate = 5.5 rate = 0.0208 duration = df['t1'] if duration < t_zone: if(duration * rate) >= max_rate: return max_rate else: return(duration * rate) else: if duration >= 720: x = int(duration/720) y = ((duration%720) * rate) if y >= max_rate: return((x * max_rate) + max_rate) else: return((x * max_rate) + y) cost = get_cost(df) 错误。如果有人有更好的解决方案,或者可以帮助翻译该if / else语句,那将是更神奇的方式!

3 个答案:

答案 0 :(得分:3)

除非绝对必要,否则在熊猫中使用循环和if语句效率不高。这是一个完全矢量化的100%熊猫解决方案:

import numpy as np # Needs numpy, too
x = df['t1'] // 720 * max_rate # Note the use of //!
y = df['t1'] %  720 * rate
df['cost'] = np.where(df['t1'] < t_zone, 
                      np.minimum(df['t1'] * rate, max_rate),
                      np.minimum(y,               max_rate) + x)

答案 1 :(得分:2)

尝试此解决方案。

import pandas as pd

df = pd.read_csv('file.csv')

def get_cost(x):
    t_zone = 720
    max_rate = 5.5
    rate = 0.0208
    duration = x['t1']
    if duration < t_zone:
        if(duration * rate) >= max_rate:
            return max_rate
        else:
            return(duration * rate)
    else:
        if duration >= 720:
            x = int(duration/720)
            y = ((duration%720) * rate)
            if y >= max_rate:
                return((x * max_rate) + max_rate)
            else:
                return((x * max_rate) + y)

df['cost'] = df.apply(get_cost, axis=1)

您也可以将结果分配给同一列。在这种情况下,我已分配给一个名为“ cost”的自定义列。

输出:

    t1  cost
0   8   0.1664
1   1134    11.0000
2   0   0.0000
3   119 2.4752
4   122 2.5376
5   446 5.5000
6   21  0.4368
7   0   0.0000
8   138 2.8704
9   0   0.0000

答案 2 :(得分:1)

您应该在持续时间内进行迭代,而不是直接将其与数字进行比较。你可以这样做。

import pandas as pd

df = pd.read_csv('file.csv', sep=';')

def get_cost(df):
    t_zone = 720
    max_rate = 5.5
    rate = 0.0208
    duration = df['t1']
    ratecol = []
    for i in duration:
        if i < t_zone:
            if(i * rate) >= max_rate:
                ratecol.append(max_rate)
            else:
                ratecol.append(i * rate)
        else:
            if i >= 720:
                x = int(i/720)
                y = ((i%720) * rate)
                if y >= max_rate:
                    ratecol.append((x * max_rate) + max_rate)
                else:
                    ratecol.append((x * max_rate) + y)
    return ratecol
df['cost'] = get_cost(df)

此代码产生的结果与之前发布的结果完全相同。