Question

我想在pd.DataFrame中添加一列，根据对现有列的检查，我会在其中写入值。

我想检查词典中的值。假设我有以下词典：

{"<=4":[0,4], "(4,10]":[4,10], ">10":[10,inf]}

现在我想在我的DataFrame中检查一个列，如果此列中的值属于字典中的任何间隔。如果是这样，我想将匹配的字典键写入同一数据帧中的第二列。

这样的DataFrame就像：

将成为：

     col_1   col_2
  a    3     "<=4"
  b    15    ">10"
  c    8     "(4,10]"

Answer 1

pd.cut()函数用于将连续变量转换为分类变量，在本例中我们有.panel-primary { bottom:0; position:fixed; }，这意味着我们有3个类别[0 , 4 , 10 , np.inf]，[0 , 4]，{ {1}}，因此[4 , 10]和[10 , inf]之间的任何值都将分配到类别0，4和[ 0 , 4]之间的任何值都将分配给4类别10等等。

然后您按相同顺序为每个类别指定名称，您可以使用label参数执行此操作，在这种情况下，我们有3个类别[ 4 , 10 ]，[0 , 4]，[4 , 10] ，我们只需将[10 , inf]分配给label参数，这意味着['<=4' , '(4,10]' , '>10']类别将被命名为[0 , 4]，而<=4类别将被命名为[4 , 10]，因此在。

(4,10]

Answer 2

您可以使用此方法：

dico = pd.DataFrame({"<=4":[0,4], "(4,10]":[4,10], ">10":[10,float('inf')]}).transpose()

foo = lambda x: dico.index[(dico[1]>x) & (dico[0]<=x)][0]

df['col_1'].map(foo)

#0       <=4
#1       >10
#2    (4,10]
#Name: col1, dtype: object

Answer 3

此解决方案创建一个名为extract_str的函数，该函数应用于col_1。它使用条件列表推导来遍历字典中的键和值，检查值是否大于或等于较低值且小于较高值。进行检查以确保此结果列表不包含多个结果。如果列表中有值，则为回。否则，默认情况下会返回None。

from numpy import inf

d = {"<=4": [0, 4], "(4,10]": [4, 10], ">10": [10, inf]}

def extract_str(val):
    results = [key for key, value_range in d.iteritems() 
               if value_range[0] <= val < value_range[1]]
    if len(results) > 1:
        raise ValueError('Multiple ranges satisfied.')
    if results:
        return results[0]

df['col_2'] = df.col_1.apply(extract_str)

>>> df
   col_1   col_2
a      3     <=4
b     15     >10
c      8  (4,10]

在这个小型数据框架上，此解决方案比@ColonelBeauvel提供的解决方案快得多。

%timeit df['col_2'] = df.col_1.apply(extract_str)
1000 loops, best of 3: 220 µs per loop

%timeit df['col_2'] = df['col_1'].map(foo)
1000 loops, best of 3: 1.46 ms per loop

Answer 4

您可以使用功能进行映射。像例子。我希望它可以帮助你。

import pandas as pd
d = {'col_1':[3,15,8]}
from numpy import inf
test = pd.DataFrame(d,index=['a','b','c'])
newdict = {"<=4":[0,4], "(4,10]":[4,10], ">10":[10,inf]}

def mapDict(num):
    print(num)
    for key,value in newdict.items():
        tmp0 = value[0]
        tmp1 = value[1]
        if num == 0:
            return "<=4"
        elif (num> tmp0) & (num<=tmp1):
            return key

test['col_2']=test.col_1.map(mapDict)

然后测试将成为：

  col_1 col_2
a   3   <=4
b   15  >10
c   8   (4,10]

PS。我想知道如何在堆栈溢出中快速编码，是否有人可以告诉我这些技巧？

Pandas DataFrame：根据现有列的值检查将值写入列

4 个答案: