我需要为每个选择和每个城市创建一个虚拟变量。
选择集是一个整数列表:[10, 20, 30, 40, 50]
,
而城市集是一个字符串列表:['XX', 'YY', 'ZZ']
。
这是数据帧:
choice city
10 XX
20 YY
20 YY
30 XX
10 XX
20 YY
40 ZZ
40 ZZ
50 YY
预期结果:
choice city 10_XX 10_YY 10_ZZ 20_XX 20_YY 20_ZZ 30_XX 30_YY 30_ZZ 40_XX 40_YY 40_ZZ 50_XX 50_YY 50_ZZ
10 XX 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
20 YY 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0
20 YY 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0
30 XX 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
10 XX 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
20 YY 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0
40 ZZ 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0
40 ZZ 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0
50 YY 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0
答案 0 :(得分:1)
您可以尝试:
import numpy as np
_choice=[10, 20, 30, 40, 50]
_city=["XX", "YY", "ZZ"]
for ch in _choice:
for ci in _city:
df[f"{ch}_{ci}"]=np.where((df["choice"]==ch)&(df["city"]==ci), 1,0)
并且没有for
循环:
import numpy as np
import itertools
_choice=[10, 20, 30, 40, 50]
_city=["XX", "YY", "ZZ"]
opts=list(itertools.product(_choice, _city))
df[list(map(lambda x: f"{x[0]}_{x[1]}", opts))]=df.apply(lambda x: pd.Series({f"{el[0]}_{el[1]}": 1 if (x["choice"]==el[0]) & (x["city"]==el[1]) else 0 for el in opts}) , axis=1).reset_index(drop=True)
答案 1 :(得分:1)
您可以使用outer
比较。
u = np.equal.outer(df, df).any(1).all(-1).view('i1')
array([[1, 0, 0, 0, 1, 0, 0, 0, 0],
[0, 1, 1, 0, 0, 1, 0, 0, 0],
[0, 1, 1, 0, 0, 1, 0, 0, 0],
[0, 0, 0, 1, 0, 0, 0, 0, 0],
[1, 0, 0, 0, 1, 0, 0, 0, 0],
[0, 1, 1, 0, 0, 1, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 1, 1, 0],
[0, 0, 0, 0, 0, 0, 1, 1, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 1]], dtype=int8)
现在返回所需的DataFrame:
index = pd.MultiIndex.from_frame(df)
columns = index.map("{0[0]}_{0[1]}".format)
allc = set(
f'{i}_{j}' for i in df['choice'] for j in df['city'])
res = pd.DataFrame(u, index, columns).T.drop_duplicates().T
res.reindex(allc, axis=1, fill_value=0)
40_ZZ 50_ZZ 20_YY 50_XX 40_XX 20_ZZ 20_XX 10_YY 30_ZZ 30_YY 10_XX 30_XX 50_YY 40_YY 10_ZZ
choice city
10 XX 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0
20 YY 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
YY 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
30 XX 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0
10 XX 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0
20 YY 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
40 ZZ 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
ZZ 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
50 YY 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0