Question

我有一个3列的数据框（包括索引）：

   name   age
 0 satya   24
 1 abc     26
 2 xyz     29
 3 def     32

所以需要添加一个新列detail来存储详细文件名，该列中的值应该像(str(file_index no))

   name   age  detail
 0 satya   24  file_0
 1 abc     26  file_1
 2 xyz     29  file_2 
 3 def     32  file_3

实现了我尝试了以下

df['detail']= str('file_'+df.index)   #not working shows error
df['detail'] = str('file'+'_'+str(df.index))  #worked but not what i want
df['detail'] = str(s+'_'+df.index[0].astype(str))  #error

实现循环和iterrows

 for index, row in df.iterrows():
        df['detail'] = str('file'+'_'+row[index])   #IndexError: index out of bounds

for index, row in df.iterrows():
df['idx'] = str(s+'_'+df.index[row].astype(str))  ###IndexError: arrays used as indices must be of integer (or boolean) type

所以请建议。

Answer 1

您可以astype使用index：

df['detail']= 'file_' + df.index.astype(str)
print df
    name  age  detail
0  satya   24  file_0
1    abc   26  file_1
2    xyz   29  file_2
3    def   32  file_3

下一个解决方案是使用map：

df['detail'] = 'file_' + df.index.map(str)

#python 3.6+ solution
df['detail'] = [f"file_{i}" for i in df.index]

比较

#[40000 rows x 2 columns]
df = pd.concat([df] * 10000, ignore_index=True)

In [153]: %timeit df['detail']= 'file_' + df.index.astype(str)
31.2 ms ± 423 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [154]: %timeit df['detail1'] = 'file_' + df.index.map(str)
16.9 ms ± 411 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [155]: %timeit df['detail'] = [f"file_{i}" for i in df.index]
2.95 ms ± 180 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

将索引和字符串连接到新列

1 个答案: