Question

我有以下数据框：

df1 = pd.DataFrame()
df1 ['TG'] = [0,2,1,3,5,7,]
df1['Value'] =[0.2,0.5,0.015,0.6,0.11,0.12]

我想根据TG列的值创建新列（即<1，<2，<3，<4和> 0，> 1，> 2，> 3等）列名称将为U0.5，U1.5，U2.5，U3.5，O0.5，O1.5，O2.5，O3.5 因此，我将有8个具有上述列名称的新列。每个单元格的值将来自值列。我的预期输出如下：

我可以使用np.where一次创建一个新列。

有人可以建议我如何循环吗？

谢谢。

Zep

Answer 1

使用numpy广播，因此无需循环：

#create array
arr = np.arange(1, 5) - .5
print (arr)
[0.5 1.5 2.5 3.5]

#create Mx1 arrays from Series
vals = df1['Value'].values[:, None]
tg = df1['TG'].values[:, None]

#compare arrays and multiple, use DataFrame constructor
df2 = pd.DataFrame((arr > tg) * vals, columns=arr).add_prefix('U')
df3 = pd.DataFrame((arr < tg) * vals, columns=arr).add_prefix('O')

#join all together
df = pd.concat([df1, df2, df3], axis=1)
print (df)  
   TG  Value  U0.5   U1.5   U2.5   U3.5   O0.5  O1.5  O2.5  O3.5
0   0  0.200   0.2  0.200  0.200  0.200  0.000  0.00  0.00  0.00
1   2  0.500   0.0  0.000  0.500  0.500  0.500  0.50  0.00  0.00
2   1  0.015   0.0  0.015  0.015  0.015  0.015  0.00  0.00  0.00
3   3  0.600   0.0  0.000  0.000  0.600  0.600  0.60  0.60  0.00
4   5  0.110   0.0  0.000  0.000  0.000  0.110  0.11  0.11  0.11
5   7  0.120   0.0  0.000  0.000  0.000  0.120  0.12  0.12  0.12

循环解决方案：

arr = np.arange(1, 5) - .5
for x in arr:
    df1[f"U{x}"] = df1["Value"] * (df1["TG"] < x)
for x in arr:
    df1[f"O{x}"] = df1["Value"] * (df1["TG"] > x)

print (df1)
   TG  Value  U0.5   U1.5   U2.5   U3.5   O0.5  O1.5  O2.5  O3.5
0   0  0.200   0.2  0.200  0.200  0.200  0.000  0.00  0.00  0.00
1   2  0.500   0.0  0.000  0.500  0.500  0.500  0.50  0.00  0.00
2   1  0.015   0.0  0.015  0.015  0.015  0.015  0.00  0.00  0.00
3   3  0.600   0.0  0.000  0.000  0.600  0.600  0.60  0.60  0.00
4   5  0.110   0.0  0.000  0.000  0.000  0.110  0.11  0.11  0.11
5   7  0.120   0.0  0.000  0.000  0.000  0.120  0.12  0.12  0.12

Answer 2

如果您仍然想要循环，则有一种简单而优雅的方法可以实现：

l = [0.5, 1.5, 2.5, 3.5]
for item in l:
    df1["U" + str(item)] = df1["Value"] * (df1["TG"] < item)
    df1["O" + str(item)] = df1["Value"] * (df1["TG"] > item)

输出为：

TG  Value   U0.5    O0.5    U1.5    O1.5    U2.5    O2.5    U3.5    O3.5
0   0   0.200   0.2 0.000   0.200   0.00    0.200   0.00    0.200   0.00
1   2   0.500   0.0 0.500   0.000   0.50    0.500   0.00    0.500   0.00
2   1   0.015   0.0 0.015   0.015   0.00    0.015   0.00    0.015   0.00
3   3   0.600   0.0 0.600   0.000   0.60    0.000   0.60    0.600   0.00
4   5   0.110   0.0 0.110   0.000   0.11    0.000   0.11    0.000   0.11
5   7   0.120   0.0 0.120   0.000   0.12    0.000   0.12    0.000   0.12

此时您需要重新排列列顺序

使用基于Python中逻辑表达式的for循环创建新列

2 个答案: