我有一个3列的数据框(包括索引):
name age
0 satya 24
1 abc 26
2 xyz 29
3 def 32
所以需要添加一个新列detail
来存储详细文件名,该列中的值应该像(str(file_index no))
name age detail
0 satya 24 file_0
1 abc 26 file_1
2 xyz 29 file_2
3 def 32 file_3
实现了我尝试了以下
df['detail']= str('file_'+df.index) #not working shows error
df['detail'] = str('file'+'_'+str(df.index)) #worked but not what i want
df['detail'] = str(s+'_'+df.index[0].astype(str)) #error
实现循环和iterrows
for index, row in df.iterrows():
df['detail'] = str('file'+'_'+row[index]) #IndexError: index out of bounds
for index, row in df.iterrows():
df['idx'] = str(s+'_'+df.index[row].astype(str)) ###IndexError: arrays used as indices must be of integer (or boolean) type
所以请建议。
答案 0 :(得分:1)
您可以astype
使用index
:
df['detail']= 'file_' + df.index.astype(str)
print df
name age detail
0 satya 24 file_0
1 abc 26 file_1
2 xyz 29 file_2
3 def 32 file_3
下一个解决方案是使用map
:
df['detail'] = 'file_' + df.index.map(str)
#python 3.6+ solution
df['detail'] = [f"file_{i}" for i in df.index]
比较
#[40000 rows x 2 columns]
df = pd.concat([df] * 10000, ignore_index=True)
In [153]: %timeit df['detail']= 'file_' + df.index.astype(str)
31.2 ms ± 423 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [154]: %timeit df['detail1'] = 'file_' + df.index.map(str)
16.9 ms ± 411 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [155]: %timeit df['detail'] = [f"file_{i}" for i in df.index]
2.95 ms ± 180 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)