我有这样的df,我想将值列表更改为列
```
uid device
0 000 [1.0, 3.0]
1 001 [3.0]
2 003 [nan]
3 004 [2.0, 3.0]
4 005 [1.0]
5 006 [1.0]
6 006 [nan]
7 007 [2.0]
```
应该是
```
uid device NA just_1 just_2or3 Both
0 000 [1.0, 3.0] 0 0 0 1
1 001 [3.0] 0 0 1 0
2 003 [nan] 1 0 0 0
3 004 [2.0, 3.0] 0 0 "1" 0
4 005 [1.0] 0 1 0 0
5 006 [1.0] 0 1 0 0
6 006 [nan] 1 0 0 0
7 007 [2.0] 0 1 1 0
8 008 [1.0, 2.0] 0 0 0 1
```
我想更改为虚拟变量,如果设备仅为1.0,则设置相应的列值= 1,如果是2.0,3.0,[2.0,3.0],则设置为just_2or3 = 1.
只有列表中的1.0,如[1.0,3.0],[1.0,2.0],才设置为= 1
我该怎么做? 谢谢
答案 0 :(得分:1)
您可以将自定义函数f
与列表推导结合使用,将boolean
的最后一次转换为int
astype
:
df = pd.DataFrame({'uid':['000','001','002','003','004','005','006','007'],
'device':[[1.0,3.0],[3.0],[np.nan],[2.0,3.0],
[1.0],[1.0],[np.nan],[2.0]]})
print (df)
device uid
0 [1.0, 3.0] 000
1 [3.0] 001
2 [nan] 002
3 [2.0, 3.0] 003
4 [1.0] 004
5 [1.0] 005
6 [nan] 006
7 [2.0] 007
def f(x):
#print (x)
NA = [np.nan in x][0]
just_1 = [1 in x and not(2 in x or 3 in x)][0]
both = [1 in x and (2 in x or 3 in x)][0]
just_2or3 = [1 not in x and (2 in x or 3 in x)][0]
return pd.Series([NA, just_1, just_2or3, both],
index=['NA','just_1','just_2or3', 'both'])
print (df.set_index('uid').device.apply(f).astype(int).reset_index())
uid NA just_1 just_2or3 both
0 000 0 0 0 1
1 001 0 0 1 0
2 002 1 0 0 0
3 003 0 0 1 0
4 004 0 1 0 0
5 005 0 1 0 0
6 006 1 0 0 0
7 007 0 0 1 0
答案 1 :(得分:0)
您可以通过将条件表示为布尔值并将其转换为int
来创建此类列,所有这些都包含在列表解析中:
df['just_1'] = [int(1 in x and not(2 in x or 3 in x)) for x in df.device]
和
df['both'] = [int(1 in x and (2 in x or 3 in x)) for x in df.device]
和
df['just_2or3'] = [int(1 not in x and (2 in x or 3 in x)) for x in df.device]
和
df['NA'] = [int(np.nan in x) for x in df.device]
等等。
答案 2 :(得分:0)
您可以使用自定义功能pandas.DataFrame.apply
和pandas.get_dummies
功能:
def worker(x):
ch1 = 1 in x
ch23 = any(i in x for i in [2,3])
if ch1 and ch23:
return 'both'
elif ch1:
return 'just_1'
elif ch23:
return 'just_2or3'
else:
return 'NA'
>>> res = pd.get_dummies(df.device.apply(worker))
>>> res
NA both just_1 just_2or3
0 0 1 0 0
1 0 0 0 1
2 1 0 0 0
3 0 0 0 1
4 0 0 1 0
5 0 0 1 0
6 1 0 0 0
7 0 0 0 1
旧回答
def worker(x):
ch1 = 1 in x
ch23 = any(i in x for i in [2,3])
if ch1 and ch23:
return {'both':1}
elif ch1:
return {'just_1':1}
elif ch23:
return {'just_2or3':1}
else:
return {'NA':1}
>>> res = df.device.apply(worker).apply(pd.Series).fillna(0).astype(int)
>>> res
NA both just_1 just_2or3
0 0 1 0 0
1 0 0 0 1
2 1 0 0 0
3 0 0 0 1
4 0 0 1 0
5 0 0 1 0
6 1 0 0 0
7 0 0 0 1
如果您需要合并数据集:
>>> pd.concat([df, res], axis=1)
device uid NA both just_1 just_2or3
0 [1.0, 3.0] 000 0 1 0 0
1 [3.0] 001 0 0 0 1
2 [nan] 002 1 0 0 0
3 [2.0, 3.0] 003 0 0 0 1
4 [1.0] 004 0 0 1 0
5 [1.0] 005 0 0 1 0
6 [nan] 006 1 0 0 0
7 [2.0] 007 0 0 0 1