我项目中的关键一步是跟踪熊猫数据框一列中子样本值的绝对差。
我设法编写了一个for循环来创建子样本。我选择每个人,并每年都要对其进行观察。我进一步访问了每个组的第一个元素的索引,甚至比较了每个第二个元素的索引。
这是我的MWE数据:
df = pd.DataFrame({'year': ['2001', '2004', '2005', '2006', '2007', '2008', '2009',
'2003', '2004', '2005', '2006', '2007', '2008', '2009',
'2003', '2004', '2005', '2006', '2007', '2008', '2009'],
'id': ['1', '1', '1', '1', '1', '1', '1',
'2', '2', '2', '2', '2', '2', '2',
'5', '5', '5','5', '5', '5', '5'],
'money': ['15', '15', '15', '21', '21', '21', '21',
'17', '17', '17', '20', '17', '17', '17',
'25', '30', '22', '25', '8', '7', '12']}).astype(int)
这是我的代码:
# do it for all IDs in my dataframe
for i in df.id.unique():
# now check every given year for that particular ID
for j in df[df['id']==i].year:
# access the index of the first element of that ID, as integer
index = df[df['id']==i].index.values.astype(int)[0]
# use that index to calculate absolute difference of the first and second element
abs_diff = abs( df['money'].iloc[index] - df['money'].iloc[index+1] )
# print all the changes, before further calculations
index =+1
print(abs_diff)
我的索引没有更新。它产生0000000 0000000 5555555(3 x 7更改),但它应该显示0,0,0,6,0,0,0 0,0,0,3,-3,0,0 0,5,-8,3 ,-17,-1,5(3 x 7更改)。由于第一个或最后一个元素都没有变化,因此我在每个组前面添加了0。
解决方案,我将第二个循环从for更改为while:
for i in df.id.unique():
first = df[df['id']==i].index.values.astype(int)[0] # ID1 = 0
last = df[df['id']==i].index.values.astype(int)[-1] # ID1 = 6
while first < last:
abs_diff = abs( df['money'][first] - df['money'][first+1] )
print(abs_diff)
first +=1
答案 0 :(得分:2)
对于df [df ['id'] == i] .year中的j:
index = df[(df['id']==i)&(df['year']==j)].index.values[0].astype(int)
try:
abs_diff = abs(df['money'].iloc[index] - df['money'].iloc[index+1] )
except:
pass
print(abs_diff)`
输出: 0 0 6 0 0 0 4 0 0 3 3 0 0 8 5 8 3 17 1个 5
答案 1 :(得分:1)
您当前始终在检查每个批次的第一个值,因此您需要执行以下操作:
# do it for all IDs in my dataframe
for i in df.id.unique():
# now check every given year for that particular ID
for idx,j in enumerate(df[df['id']==i].year):
# access the index of the first element of that ID, as integer
index = df[df['id']==i].index.values.astype(int)[idx]
# use that index to calculate absolute difference of the first and second element
try:
abs_diff = abs( df['money'][index] - df['money'][index+1] )
except:
continue
# print all the changes, before further calculations
index =+1
print(abs_diff)
哪个输出:
0
0
6
0
0
0
4
0
0
3
3
0
0
8
5
8
3
17
1
5