Question

我正在尝试一种有效的方法来提取pandas.DataFrame或.Series对象（如果存在）中的值的索引。

应用程序是列出一个已知其子节点的父节点。

例如，我有一个像这样的数据框node_df：

         child_node_id
node_id               
0                    1
1                    5
2                    3
3                   -1
4                    7
5                   -1
6                   -1
7                   -1
8                   -1

节点0将节点1作为其子节点。
节点1将节点5作为其子节点。
节点2将节点3作为其子节点。节点4将节点7作为其子节点。

所有其他节点都没有子节点（child_node_id == -1）。

我想要parent_node_id这样：

         child_node_id  parent_node_id
node_id               
0                    1              -1
1                    5               0
2                    3               1
3                   -1               2
4                    7              -1
5                   -1               1
6                   -1              -1
7                   -1               4
8                   -1              -1

现在，我使用循环

node_df['parent_node_id'] = -1
for ix, elem in node_df['child_node_id'].iteritems():
  node_df.loc[elem, 'parent_node_id'] = ix if elem >= -1 else -1

我想知道是否还有其他熊猫风格的东西（类似于Python中的list.index()）可以工作而不循环。

Answer 1

弄清楚，答案是将子节点的值用作父节点列的索引，而将子节点的索引用作父节点的值。

node_df['parent_node_id'] = -1
children = node_df.loc[node_df.child_node_id >= 0, 'child_node_id']
# child : (index, value):: parent : (value, index)
node_df.loc[children, 'parent_node_id'] = children.index
node_df['parent_node_id']

根据熊猫列中每个值的值返回索引

1 个答案: