Question

我有这样的df，我想将值列表更改为列

```

    uid   device
0   000 [1.0, 3.0]
1   001 [3.0]
2   003 [nan]
3   004 [2.0, 3.0]
4   005 [1.0]
5   006 [1.0]
6   006 [nan]
7   007 [2.0]
```

应该是

```

    uid  device      NA  just_1  just_2or3  Both
0   000 [1.0, 3.0]   0     0         0        1
1   001 [3.0]        0     0         1        0
2   003 [nan]        1     0         0        0
3   004 [2.0, 3.0]   0     0        "1"       0
4   005 [1.0]        0     1         0        0
5   006 [1.0]        0     1         0        0
6   006 [nan]        1     0         0        0
7   007 [2.0]        0     1         1        0
8   008 [1.0, 2.0]   0     0         0        1

```

我想更改为虚拟变量，如果设备仅为1.0，则设置相应的列值= 1，如果是2.0,3.0，[2.0,3.0]，则设置为just_2or3 = 1.

只有列表中的1.0，如[1.0,3.0]，[1.0,2.0]，才设置为= 1

我该怎么做？谢谢

Answer 1

您可以将自定义函数f与列表推导结合使用，将boolean的最后一次转换为int astype：

df = pd.DataFrame({'uid':['000','001','002','003','004','005','006','007'],
                   'device':[[1.0,3.0],[3.0],[np.nan],[2.0,3.0],
                             [1.0],[1.0],[np.nan],[2.0]]})

print (df)
       device  uid
0  [1.0, 3.0]  000
1       [3.0]  001
2       [nan]  002
3  [2.0, 3.0]  003
4       [1.0]  004
5       [1.0]  005
6       [nan]  006
7       [2.0]  007

def f(x):
    #print (x)
    NA = [np.nan in x][0]
    just_1  = [1 in x and not(2 in x or 3 in x)][0]
    both = [1 in x and (2 in x or 3 in x)][0]
    just_2or3 = [1 not in x and (2 in x or 3 in x)][0]
    return pd.Series([NA, just_1, just_2or3, both], 
                     index=['NA','just_1','just_2or3', 'both'])

print (df.set_index('uid').device.apply(f).astype(int).reset_index())
   uid  NA  just_1  just_2or3  both
0  000   0       0          0     1
1  001   0       0          1     0
2  002   1       0          0     0
3  003   0       0          1     0
4  004   0       1          0     0
5  005   0       1          0     0
6  006   1       0          0     0
7  007   0       0          1     0

Answer 2

您可以通过将条件表示为布尔值并将其转换为int来创建此类列，所有这些都包含在列表解析中：

df['just_1'] = [int(1 in x and not(2 in x or 3 in x)) for x in df.device]

和

df['both'] = [int(1 in x and (2 in x or 3 in x)) for x in df.device]

和

df['just_2or3'] = [int(1 not in x and (2 in x or 3 in x)) for x in df.device]

和

df['NA'] = [int(np.nan in x) for x in df.device]

等等。

Answer 3

您可以使用自定义功能pandas.DataFrame.apply和pandas.get_dummies功能：

def worker(x):
    ch1 = 1 in x
    ch23 = any(i in x for i in [2,3])
    if ch1 and ch23:
        return 'both'
    elif ch1:
        return 'just_1'
    elif ch23:
        return 'just_2or3'
    else:
        return 'NA'

>>> res = pd.get_dummies(df.device.apply(worker))
>>> res
   NA  both  just_1  just_2or3
0   0     1       0          0
1   0     0       0          1
2   1     0       0          0
3   0     0       0          1
4   0     0       1          0
5   0     0       1          0
6   1     0       0          0
7   0     0       0          1

旧回答

def worker(x):
    ch1 = 1 in x
    ch23 = any(i in x for i in [2,3])
    if ch1 and ch23:
        return {'both':1}
    elif ch1:
        return {'just_1':1}
    elif ch23:
        return {'just_2or3':1}
    else:
        return {'NA':1}

>>> res = df.device.apply(worker).apply(pd.Series).fillna(0).astype(int)
>>> res
   NA  both  just_1  just_2or3
0   0     1       0          0
1   0     0       0          1
2   1     0       0          0
3   0     0       0          1
4   0     0       1          0
5   0     0       1          0
6   1     0       0          0
7   0     0       0          1

如果您需要合并数据集：

>>> pd.concat([df, res], axis=1)
       device  uid  NA  both  just_1  just_2or3
0  [1.0, 3.0]  000   0     1       0          0
1       [3.0]  001   0     0       0          1
2       [nan]  002   1     0       0          0
3  [2.0, 3.0]  003   0     0       0          1
4       [1.0]  004   0     0       1          0
5       [1.0]  005   0     0       1          0
6       [nan]  006   1     0       0          0
7       [2.0]  007   0     0       0          1

pandas将值列表更改为列

3 个答案: