您好,我想遍历CPB%行并将计算结果添加到名为“ Proba”的相关列中。我的数据框如下所示:
到目前为止我尝试过的内容如下:
bins = np.linspace(0, 1, num=100)
dCPB = df['CPB%']
df['binnedB'] = pd.cut(dCPB, bins)
dfnew = pd.DataFrame(pd.cut(df['CPB%'], bins=bins).value_counts()).sort_index(ascending = True)
dfnew['binned'] = dfnew.index
total = dfnew['CPB%'].sum()
idx = total
for index,row in dfnew.iterrows():
idx = idx - row['CPB%']
row['Proba'] = float(idx) / float(total)
但是我的迭代不会更新我的空列Proba,为什么?谢谢!
答案 0 :(得分:2)
我认为问题是,您正在将结果分配回row
,而该proba = []
for index, row in dfnew.iterrows():
idx = idx - row['CPB%']
proba.append(float(idx) / float(total))
dfnew['Proba'] = proba
不会存储在任何地方。相反,您可以这样做:
.apply
但是,这不是最好的方法,可以将axis=1
与Sub test()
Dim vDB, vR()
Dim i As Long, j As Integer, n As Long
Dim r As Long
vDB = Range("a1").CurrentRegion
r = UBound(vDB, 1)
For i = 1 To r
For j = 1 To 6
n = n + 1
ReDim Preserve vR(1 To 2, 1 To n)
vR(1, n) = vDB(i, j)
vR(2, n) = vDB(i, j + 6)
Next j
Next i
Sheets.Add
Range("a1").Resize(n, 2) = WorksheetFunction.Transpose(vR)
End Sub
一起使用来对数据帧进行逐行计算。
答案 1 :(得分:2)
您可以使用pd.Series.cumsum
来进行迭代推论:
total = dfnew['CPB%'].sum()
dfnew['Proba'] = 1 - df['CPB%'].cumsum() / total
对于熊猫,您应该着眼于向量化算法,该算法通常涉及列式操作,而不是行式for
循环。这是一个完整的演示:
df = pd.DataFrame({'A': list(range(1, 7))})
def jpp(df):
total = df['A'].sum()
df['Proba'] = 1 - df['A'].cumsum() / total
return df
def yolo(df):
total = df['A'].sum()
idx = total
proba = []
for index, row in df.iterrows():
idx = idx - row['A']
proba.append(float(idx) / float(total))
df['Proba'] = proba
return df
# check results are the same
assert df.pipe(jpp).equals(df.pipe(yolo))
%timeit df.pipe(jpp) # 691 µs
%timeit df.pipe(yolo) # 840 µs