我试图计算某个行值“Neg”从其默认值0变为1所需的行数,并在一个名为“dsf”的新列中捕获此值,并计算Neg = 1。我尝试了下面的代码片段,我不确定为什么,但这会为所有'dsf'值设置0。
为什么这是错的?
/代码
full_data['dsf'] = 0
counter = 0
for i,r in full_data.iterrows():
if r['neg'] == 0:
counter+=1
r['dsf'] = 0
else:
r['dsf'] = counter
counter = 0
full_data
当前输出:
datehour pft rev mgn neg dsf
0 2018-04-01 00:00:00 53.1783 110.8514 0.479726 0 0
1 2018-04-01 00:30:00 51.1496 105.9060 0.482972 0 0
2 2018-04-01 01:00:00 42.9360 120.7555 0.355561 1 0
3 2018-04-01 01:30:00 37.8455 114.5514 0.330380 0 0
4 2018-04-01 02:00:00 43.9254 99.1340 0.443091 1 0
理想输出:
datehour pft rev mgn neg dsf
0 2018-04-01 00:00:00 53.1783 110.8514 0.479726 0 0
1 2018-04-01 00:30:00 51.1496 105.9060 0.482972 0 0
2 2018-04-01 01:00:00 42.9360 120.7555 0.355561 1 3
3 2018-04-01 01:30:00 37.8455 114.5514 0.330380 0 0
4 2018-04-01 02:00:00 43.9254 99.1340 0.443091 1 2
答案 0 :(得分:1)
您应该在for循环之外初始化计数器。这是一个例子:
df = pd.DataFrame({'neg': [0, 0, 1, 0, 1]})
df['dsf'] = 0
counter = 1
for i, j in df.iterrows():
if j['neg'] == 0:
j['dsf'] = 0
counter += 1
else:
j['dsf'] = counter
counter = 1
df
输出:
neg dsf
0 0 0
1 0 0
2 1 3
3 0 0
4 1 2
请注意,结果与您想要的输出完全相同。 但是如果你只想计算空值,那么你应该在for循环的外部和结尾初始化计数为0。结果应该是这样的:
neg dsf
0 0 0
1 0 0
2 1 2
3 0 0
4 1 1
答案 1 :(得分:0)
来自iterrows docs:
你永远不应该修改你正在迭代的东西。这并不能保证在所有情况下都有效。根据数据类型,迭代器返回一个副本而不是视图,写入它将不起作用。
因此,在您的情况下,在for
循环中,您不会修改原始DataFrame
,因为iterrows
会返回副本。有关视图和副本的更多详细信息,请阅读http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
以下是您的代码的修复版本:
df = pd.DataFrame([
['2018-04-01 00:00:00', 53.1783, 110.8514, 0.479726, 0],
['2018-04-01 00:30:00', 51.1496, 105.9060, 0.482972, 0],
['2018-04-01 01:00:00', 42.9360, 120.7555, 0.355561, 1],
['2018-04-01 01:30:00', 37.8455, 114.5514, 0.330380, 0],
['2018-04-01 02:00:00', 43.9254, 99.1340, 0.443091, 1]],
columns=['datehour', 'pft', 'rev', 'mgn', 'neg'])
df['dsf'] = 0
counter = 0
for i,r in df.iterrows():
counter += 1
if r['neg'] != 0:
df.loc[i, 'dsf'] = counter
counter = 0
print(df)
# datehour pft rev mgn neg dsf
# 0 2018-04-01 00:00:00 53.1783 110.8514 0.479726 0 0
# 1 2018-04-01 00:30:00 51.1496 105.9060 0.482972 0 0
# 2 2018-04-01 01:00:00 42.9360 120.7555 0.355561 1 3
# 3 2018-04-01 01:30:00 37.8455 114.5514 0.330380 0 0
# 4 2018-04-01 02:00:00 43.9254 99.1340 0.443091 1 2
答案 2 :(得分:0)
对于您的问题,这是一个不同的解决方案,与使用iterrows相比应该快得多。你应该总是尝试使用pandas尽可能多的矢量化。
df = pd.DataFrame({'neg': [0,0,1, 0, 1,0, 0, 1]})
indexes = df[df['neg'] == 1].index
shifted = indexes + 1
values = indexes - indexes.to_series().shift().fillna(0)
df.assign(dfs=pd.Series(vals, index=indexes)).fillna(0)
neg dfs
0 0 0.0
1 0 0.0
2 1 3.0
3 0 0.0
4 1 2.0
5 0 0.0
6 0 0.0
7 1 3.0
如果您希望自己可以将dfs列转换为int