示例：

Question

我尝试添加一个新列＆＃34; energy_class＆＃34;到数据框＆＃34; df_energy＆＃34;它包含字符串＆＃34; high＆＃34;如果＆＃34; consumption_energy＆＃34;值＆gt; 400，＆＃34; medium＆＃34;如果＆＃34; consumption_energy＆＃34;值介于200和400之间，＆＃34;低＆＃34;如果＆＃34; consumption_energy＆＃34;价值低于200。我尝试从numpy使用np.where，但我发现numpy.where(condition[, x, y])只处理两个不是3的条件，就像在我的情况下一样。

有什么好主意帮我吗？

提前谢谢

Answer 1

您可以使用ternary：

np.where(consumption_energy > 400, 'high', 
         (np.where(consumption_energy < 200, 'low', 'medium')))

Answer 2

我会在这里使用cut()方法，这会产生非常高效且节省内存的category dtype：

In [124]: df
Out[124]:
   consumption_energy
0                 459
1                 416
2                 186
3                 250
4                 411
5                 210
6                 343
7                 328
8                 208
9                 223

In [125]: pd.cut(df.consumption_energy, [0, 200, 400, np.inf], labels=['low','medium','high'])
Out[125]:
0      high
1      high
2       low
3    medium
4      high
5    medium
6    medium
7    medium
8    medium
9    medium
Name: consumption_energy, dtype: category
Categories (3, object): [low < medium < high]

Answer 3

试试这个：使用@Maxu中的设置

col         = 'consumption_energy'
conditions  = [ df2[col] >= 400, (df2[col] < 400) & (df2[col]> 200), df2[col] <= 200 ]
choices     = [ "high", 'medium', 'low' ]

df2["energy_class"] = np.select(conditions, choices, default=np.nan)


  consumption_energy energy_class
0                 459         high
1                 416         high
2                 186          low
3                 250       medium
4                 411         high
5                 210       medium
6                 343       medium
7                 328       medium
8                 208       medium
9                 223       medium

Answer 4

我喜欢保持代码干净。这就是为什么我更喜欢... <property> <name>dfs.nameservices</name> <value>mycluster</value> <final>true</final> </property> .... <property> <name>dfs.client.failover.proxy.provider.mycluster</name> <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value> </property>来执行此类任务的原因。

<property>
    <name>fs.defaultFS</name>
    <value>hdfs://mycluster</value>
</property>

然后使用以下命令将numpy数组添加为数据框中的列：

np.vectorize

这种方法的优点是，如果您希望向列添加更复杂的约束，可以轻松完成。希望它有所帮助。

Answer 5

我第二次使用np.vectorize。它比np.where快得多，而且代码也更简洁。您绝对可以说使用更大的数据集可以加快速度。您可以将字典格式用于条件以及这些条件的输出。

# Vectorizing with numpy 
row_dic = {'Condition1':'high',
          'Condition2':'medium',
          'Condition3':'low',
          'Condition4':'lowest'}

def Conditions(dfSeries_element,dictionary):
    '''
    dfSeries_element is an element from df_series 
    dictionary: is the dictionary of your conditions with their outcome
    '''
    if dfSeries_element in dictionary.keys():
        return dictionary[dfSeries]

def VectorizeConditions():
    func = np.vectorize(Conditions)
    result_vector = func(df['Series'],row_dic)
    df['new_Series'] = result_vector

    # running the below function will apply multi conditional formatting to your df
VectorizeConditions()

Answer 6

警告：请务必小心，如果您的数据缺少值np.where可能会很难使用，并可能在无意中给您错误的结果。

考虑这种情况：

df['cons_ener_cat'] = np.where(df.consumption_energy > 400, 'high', 
         (np.where(df.consumption_energy < 200, 'low', 'medium')))

# if we do not use this second line, then
#  if consumption energy is missing it would be shown medium, which is WRONG.
df.loc[df.consumption_energy.isnull(), 'cons_ener_cat'] = np.nan

或者，您可以使用一个嵌套的np.where来表示媒介，而不是难于使用的nan。

恕我直言，最好的方法是pd.cut。它处理NaN且易于使用。

示例：

import numpy as np
import pandas as pd
import seaborn as sns

df = sns.load_dataset('titanic')

# pd.cut
df['age_cat'] = pd.cut(df.age, [0, 20, 60, np.inf], labels=['child','medium','old'])


# manually add another line for nans
df['age_cat2'] = np.where(df.age > 60, 'old', (np.where(df.age <20, 'child', 'medium')))
df.loc[df.age.isnull(), 'age_cat'] = np.nan

# multiple nested where
df['age_cat3'] = np.where(df.age > 60, 'old',
                         (np.where(df.age <20, 'child',
                                   np.where(df.age.isnull(), np.nan, 'medium'))))

# outptus
print(df[['age','age_cat','age_cat2','age_cat3']].head(7))
    age age_cat age_cat2 age_cat3
0  22.0  medium   medium   medium
1  38.0  medium   medium   medium
2  26.0  medium   medium   medium
3  35.0  medium   medium   medium
4  35.0  medium   medium   medium
5   NaN     NaN   medium      nan
6  54.0  medium   medium   medium

Answer 7

myassign["assign3"]=np.where(myassign["points"]>90,"genius",(np.where((myassign["points"]>50) & (myassign["points"]] <90),"好","坏"))

当您只想使用“where”方法但具有多个条件时。我们可以通过与上面相同的方法添加更多 (np.where) 来添加更多条件。再一次，最后两个将是您想要的。

Numpy＆＃34;其中＆＃34;有多种条件

7 个答案:

示例：