选项1：

Question

我试图在脚本中实现一个步骤，在该脚本中，我在每一行中查找存储在同一DataFrame中的值的“种类”，并更新每行中有多少个值的计数每一种”。为了说明，这是一个玩具示例：

d = {0: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
 1: [1, 1, 2, 2, 1, 1, 2, 1, 1, 2],
 2: [1, 1, 2, 2, 1, 1, 1, 1, 2, 2],
 3: [2, 1, 8, 3, 6, 5, 10, 3, 4, 7],
 4: [0, 0, 4, 9, 0, 0, 0, 0, 10, 9],
 5: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
 6: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0]}

df = pd.DataFrame(d)
df.index += 1

在df中，df[0]包含对象的唯一ID，df[1]包含“种类”（这可能类似于对象的颜色）。 df[3]和df[4]包含感兴趣的相邻对象（0是占位符值，任何非零值是相邻对象的ID，因此这里有1或2个相邻对象）。 df[5]和df[6]用于存储每种类型的对象数。这里只有两种类型，它们是整数，因此类型1的相邻对象的计数进入df[5]，类型2的相邻对象的计数进入df[6]。

我有工作代码，该代码遍历行和相邻的对象列，并查找类型，然后递增适当的列。但是，这不能很好地扩展，并且我的实际数据集具有更多的行和对象类型，并且此操作在蒙特卡洛类型仿真中被重复调用。我不确定在这里可以做些什么来加快它的速度，我只是尝试了ID：Type的字典查找，但这实际上要慢一些。这是功能代码：

def countNeighbors(contactMap): #in case of subgraph, still need to know the neighbors type
    for index, row in contactMap.iterrows():
        for col in range(3,4):
            cellID = row[col]
            if cellID == 0:
                pass
            else:
                cellType = int(contactMap[contactMap[0] == cellID][1])
                contactMap.at[index, 4+cellType] += 1
    return contactMap

df = countNeighbors(df)

预期输出：

output = {0: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10], 1: [1, 1, 2, 2, 1, 1, 2, 1, 1, 2], 2: [1, 1, 2, 2, 1, 1, 1, 1, 2, 2], 3: [2, 1, 8, 3, 6, 5, 10, 3, 4, 7], 4: [0, 0, 4, 9, 0, 0, 0, 0, 10, 9], 5: [1, 1, 1, 0, 1, 1, 0, 0, 0, 0], 6: [0, 0, 0, 1, 0, 0, 1, 1, 1, 1]}

out_df = pd.DataFrame(output)
out_ df.index += 1

因此要清楚，此输出表示对象1（第1行）的类型为1，相邻的对象为对象2。我们在df中查找对象2，并看到其类型为1，然后增加col 5。有没有更快的方法来达到相同的效果？我愿意根据需要重新设计数据结构，但是这种格式很方便。

Answer 1

选项1：

type_dict = df.set_index(0)[1].to_dict()

for i in [3,4]:
    s = df[i].map(type_dict)
    df.loc[:,[5,6]] += pd.get_dummies(s)[[1,2]].values

选项2：

df.loc[:,[5,6]] = (pd.get_dummies(df[[3,4]]
                     .stack().map(type_dict))
                     .sum(level=0)
                  )

输出：

    0   1   2   3   4   5   6
1   1   1   1   2   0   1   0
2   2   1   1   1   0   1   0
3   3   2   2   8   4   1   1
4   4   2   2   3   9   1   1
5   5   1   1   6   0   1   0
6   6   1   1   5   0   1   0
7   7   2   1   10  0   0   1
8   8   1   1   3   0   0   1
9   9   1   2   4   10  0   2
10  10  2   2   7   9   1   1

通过“种类”对值进行计数并以该计数更新DataFrame中的值的更快方法？

1 个答案:

选项1：

选项2：