我有一个数据框,其列名以“ node_”开头,然后在末尾有一个数字。例如,假设数据上升到“ node_15”,然后又有更多列。
在这种情况下,如何在数据框中添加另外一列并增加一个“ node_16”列,然后将其放置在“ node_15”之后?
答案 0 :(得分:1)
不是最漂亮的,但是您可以使用split找到最大数量,找到max的位置并插入列
df = pd.DataFrame(columns=['node_1', 'node_2', 'node_3','node_4','node_5','node_6','node_7','node_8','node_9','node_10','node_11','B'])
num = max(map(int, df.filter(like = 'node_').columns.str.split('_').str[1]))
loc = df.columns.get_loc('node' + '_' + str(num)) + 1
column = 'node'+ '_'+str(num + 1)
df.insert(loc, column, np.nan)
print(df.columns)
Index(['node_1', 'node_2', 'node_3', 'node_4', 'node_5', 'node_6', 'node_7', 'node_8', 'node_9', 'node_10', 'node_11', 'node_12', 'B'],
dtype='object')
答案 1 :(得分:0)
在示例框架上实现-
df = pd.DataFrame(np.random.rand(4,12), columns=['node_1', 'node_2', 'node_3', 'node_4','node_5','node_6','node_7','node_8','node_9','node_10','node_11','B'])
+---+----------+----------+----------+----------+----------+----------+----------+----------+----------+----------+----------+----------+
| | node_1 | node_2 | node_3 | node_4 | node_5 | node_6 | node_7 | node_8 | node_9 | node_10 | node_11 | B |
+---+----------+----------+----------+----------+----------+----------+----------+----------+----------+----------+----------+----------+
| 0 | 0.626458 | 0.798481 | 0.316018 | 0.159890 | 0.507015 | 0.027955 | 0.020401 | 0.743001 | 0.914910 | 0.238461 | 0.541814 | 0.134738 |
| 1 | 0.927695 | 0.115338 | 0.378937 | 0.090682 | 0.644118 | 0.715846 | 0.049830 | 0.713174 | 0.403888 | 0.825648 | 0.376064 | 0.594877 |
| 2 | 0.592890 | 0.634705 | 0.711854 | 0.772723 | 0.451578 | 0.831289 | 0.009033 | 0.100541 | 0.114469 | 0.873390 | 0.807368 | 0.550358 |
| 3 | 0.467856 | 0.915798 | 0.889654 | 0.529412 | 0.525272 | 0.546177 | 0.724698 | 0.539031 | 0.587709 | 0.402088 | 0.464548 | 0.533932 |
+---+----------+----------+----------+----------+----------+----------+----------+----------+----------+----------+----------+----------+
cols = np.array(list(zip(*df.filter(like='node_').columns.str.split('_')))[1], dtype=int)
cols.sort()
idx = arr[-1] # new column index
df.insert(loc=int(idx), column='node_'+str(idx+1), value='')
df
+---+----------+----------+----------+----------+----------+----------+----------+----------+----------+----------+----------+---------+----------+
| | node_1 | node_2 | node_3 | node_4 | node_5 | node_6 | node_7 | node_8 | node_9 | node_10 | node_11 | node_12 | B |
+---+----------+----------+----------+----------+----------+----------+----------+----------+----------+----------+----------+---------+----------+
| 0 | 0.626458 | 0.798481 | 0.316018 | 0.159890 | 0.507015 | 0.027955 | 0.020401 | 0.743001 | 0.914910 | 0.238461 | 0.541814 | | 0.134738 |
| 1 | 0.927695 | 0.115338 | 0.378937 | 0.090682 | 0.644118 | 0.715846 | 0.049830 | 0.713174 | 0.403888 | 0.825648 | 0.376064 | | 0.594877 |
| 2 | 0.592890 | 0.634705 | 0.711854 | 0.772723 | 0.451578 | 0.831289 | 0.009033 | 0.100541 | 0.114469 | 0.873390 | 0.807368 | | 0.550358 |
| 3 | 0.467856 | 0.915798 | 0.889654 | 0.529412 | 0.525272 | 0.546177 | 0.724698 | 0.539031 | 0.587709 | 0.402088 | 0.464548 | | 0.533932 |
+---+----------+----------+----------+----------+----------+----------+----------+----------+----------+----------+----------+---------+----------+