将空白列添加到延续名称/数字约定的数据框中

时间:2019-01-17 21:42:14

标签: python-3.x pandas

我有一个数据框,其列名以“ node_”开头,然后在末尾有一个数字。例如,假设数据上升到“ node_15”,然后又有更多列。

在这种情况下,如何在数据框中添加另外一列并增加一个“ node_16”列,然后将其放置在“ node_15”之后?

例如,假设列标题是这样的: enter image description here

我想要的最终结果是: enter image description here

2 个答案:

答案 0 :(得分:1)

不是最漂亮的,但是您可以使用split找到最大数量,找到max的位置并插入列

df = pd.DataFrame(columns=['node_1', 'node_2', 'node_3','node_4','node_5','node_6','node_7','node_8','node_9','node_10','node_11','B'])


num = max(map(int, df.filter(like = 'node_').columns.str.split('_').str[1]))
loc = df.columns.get_loc('node' + '_' + str(num)) + 1
column = 'node'+ '_'+str(num + 1)
df.insert(loc, column, np.nan)

print(df.columns)

Index(['node_1', 'node_2', 'node_3', 'node_4', 'node_5', 'node_6', 'node_7', 'node_8', 'node_9', 'node_10', 'node_11', 'node_12', 'B'],
  dtype='object')

答案 1 :(得分:0)

在示例框架上实现-

df = pd.DataFrame(np.random.rand(4,12), columns=['node_1', 'node_2', 'node_3', 'node_4','node_5','node_6','node_7','node_8','node_9','node_10','node_11','B']) 

+---+----------+----------+----------+----------+----------+----------+----------+----------+----------+----------+----------+----------+
|   |  node_1  |  node_2  |  node_3  |  node_4  |  node_5  |  node_6  |  node_7  |  node_8  |  node_9  | node_10  | node_11  |    B     |
+---+----------+----------+----------+----------+----------+----------+----------+----------+----------+----------+----------+----------+
| 0 | 0.626458 | 0.798481 | 0.316018 | 0.159890 | 0.507015 | 0.027955 | 0.020401 | 0.743001 | 0.914910 | 0.238461 | 0.541814 | 0.134738 |
| 1 | 0.927695 | 0.115338 | 0.378937 | 0.090682 | 0.644118 | 0.715846 | 0.049830 | 0.713174 | 0.403888 | 0.825648 | 0.376064 | 0.594877 |
| 2 | 0.592890 | 0.634705 | 0.711854 | 0.772723 | 0.451578 | 0.831289 | 0.009033 | 0.100541 | 0.114469 | 0.873390 | 0.807368 | 0.550358 |
| 3 | 0.467856 | 0.915798 | 0.889654 | 0.529412 | 0.525272 | 0.546177 | 0.724698 | 0.539031 | 0.587709 | 0.402088 | 0.464548 | 0.533932 |
+---+----------+----------+----------+----------+----------+----------+----------+----------+----------+----------+----------+----------+


cols = np.array(list(zip(*df.filter(like='node_').columns.str.split('_')))[1], dtype=int)
cols.sort()
idx = arr[-1] # new column index
df.insert(loc=int(idx), column='node_'+str(idx+1), value='')
df


+---+----------+----------+----------+----------+----------+----------+----------+----------+----------+----------+----------+---------+----------+
|   |  node_1  |  node_2  |  node_3  |  node_4  |  node_5  |  node_6  |  node_7  |  node_8  |  node_9  | node_10  | node_11  | node_12 |    B     |
+---+----------+----------+----------+----------+----------+----------+----------+----------+----------+----------+----------+---------+----------+
| 0 | 0.626458 | 0.798481 | 0.316018 | 0.159890 | 0.507015 | 0.027955 | 0.020401 | 0.743001 | 0.914910 | 0.238461 | 0.541814 |         | 0.134738 |
| 1 | 0.927695 | 0.115338 | 0.378937 | 0.090682 | 0.644118 | 0.715846 | 0.049830 | 0.713174 | 0.403888 | 0.825648 | 0.376064 |         | 0.594877 |
| 2 | 0.592890 | 0.634705 | 0.711854 | 0.772723 | 0.451578 | 0.831289 | 0.009033 | 0.100541 | 0.114469 | 0.873390 | 0.807368 |         | 0.550358 |
| 3 | 0.467856 | 0.915798 | 0.889654 | 0.529412 | 0.525272 | 0.546177 | 0.724698 | 0.539031 | 0.587709 | 0.402088 | 0.464548 |         | 0.533932 |
+---+----------+----------+----------+----------+----------+----------+----------+----------+----------+----------+----------+---------+----------+