加快Pandas的速度:找到满足条件的所有列

时间:2019-04-10 13:27:38

标签: python pandas

我有使用pandas DataFrame表示的数据,例如,看起来如下:

| id | entity | name | value | location

其中idinteger值,entityintegernamestringvalueinteger,而locationstring(例如,美国,加拿大,英国等)。

现在,我想向该数据框添加一个新列,列“ flag”,其值分配如下:

for d in df.iterrows():

    if d.entity == 10 and d.value != 1000 and d.location == CA:
        d.flag = "A" 
    elif d.entity != 10 and d.entity != 0 and d.value == 1000 and d.location == US:
        d.flag = "C"
    elif d.entity == 0 and d.value == 1000 and d.location == US"
        d.flag = "B"
    else:
        print("Different case")

是否可以加快速度并使用一些内置函数而不是for循环?

3 个答案:

答案 0 :(得分:3)

使用np.select,您可以通过条件列表进行选择,并根据条件选择条件,并在不满足任何条件时指定默认值。

"-[h]:mm:ss"

答案 1 :(得分:3)

使用按位()-> and添加&以使用numpy.select

m = [
    (d.entity == 10) & (d.value != 1000) & (d.location == 'CA'),
    (d.entity != 10) & (d.entity != 0) & (d.value == 1000) & (d.location == 'US'),
    (d.entity == 0) & (d.value == 1000) & (d.location == 'US')
]

df['flag'] = np.select(m, ["A", "C", "B"], default="Different case")

答案 2 :(得分:0)

您编写了“查找满足一组条件的所有列”,但是您的代码显示您实际上是在尝试添加一个新列,该列的每一行的值都是根据同一行其他列的值计算得出的。

如果确实如此,可以使用df.apply,为它提供一个计算特定行的值的函数:

def flag_value(row):
    if row.entity == 10 and row.value != 1000 and row.location == CA:
        return "A"
    elif row.entity != 10 and row.entity != 0 and row.value == 1000 and row.location == US:
        return "C"
    elif row.entity == 0 and row.value == 1000 and row.location == US:
        return "B"
    else:
        return "Different case"

df['flag'] = df.apply(flag_value, axis=1)

请查看this related question,以了解更多信息。

如果您确实想查找指定条件的所有列,则对Pandas数据框执行此操作的通常方法是使用df.loc并建立索引:

only_a_cases = df.loc[df.entity == 10 & df.value != 1000 & df.location == "CA"]
# or:
only_a_cases = df.loc[lambda df: df.entity == 10 & df.value != 1000 & df.location == "CA"]