我有一个pandas列,值的范围是0.0到1.0。
我想根据阈值将此列转换为二进制列(0或1),即如果值<= threshold,它将变为0,否则为1。
答案 0 :(得分:4)
通过gt
(db.coll.delete_many({'primary_key': {'$in': keys}})
)创建布尔掩码,然后将其转换为select [UniqueID], [Date], [Orders],
sum([Orders]) over (partition by [UniqueID], [Date]) as SumOfOrders
from test
order by [UniqueID], [Date], [Orders];
s:
select t.[UniqueID], t.[Date], t.[Orders], t1.SumOfOrders
from test t cross apply
(select sum(t1.Orders) as SumOfOrders
from test t1
where t1.[UniqueID] = t.[UniqueID] and t1.[Date] = t.[Date]
) t1
order by t.[UniqueID], t.[Date], t.[Orders];
答案 1 :(得分:2)
df.column = df.column > threshold
df.column.astype(int)
答案 2 :(得分:-1)
我将创建一个帮助器列,然后遍历行并为每个单元格设置值。像这样:
import pandas as pd
import numpy as np
a = np.random.random_sample(5)
df = pd.DataFrame({"A": a})
df["Helper"] = ""
for i in range(len(df)):
if df.loc[i,"A"] <= 0.5:
df.loc[i,"Helper"] = 0
else:
df.loc[i,"Helper"] = 1
这导致了什么:
A Helper
0 0.114089 0
1 0.309759 0
2 0.158169 0
3 0.444199 0
4 0.645443 1