为列值创建虚拟变量,并用其余列的唯一组合填充行数

时间:2019-07-18 07:49:49

标签: python pandas data-analysis

如果有4列,则我们使用以下条件之一创建虚拟列:

每个新创建的变量(虚拟变量)的值将是具有其余列值和创建的虚拟变量的唯一组合的行数计数。

随附的表(源表和目标表)将有助于更好地理解问题

我尝试了添加的代码,用于从附加的源表生成附加的目标表(使用测试用例修改了实际示例),它可以工作,但是由于实际数据具有数百万条记录,因此代码会不断运行。有没有更快的方法来实现这一目标?

# df_d[(df_d['Type1'] == "X11"])&(df_d['Type2'] == "X1")&(df_d['Type3'] == "Y1") &(df_d["Action"]== action)].shape[0]

上面的陈述需要花费很多时间。任何关于更快方式的建议都会有所帮助

def find_number(k, action):
    return df_d[(df_d['Type1'] == "X11"])&(df_d['Type2'] == "X1")&(df_d['Type3'] == "Y1") &(df_d["Action"]== action)].shape[0]

vals = df_d["Action"].count_values.keys()

for i in vals:
    ### code to call with each action values


Source table

创建Source表的代码:

import pandas as pd

df1 = pd.DataFrame({"Type1": ["x11","x11","x11","x12","x12","x12","x12","x12","x12"], "Type2": ["x1","x2","x2","x2","x1","x1","x2","x1","x1"], "Type3":["y1","y2","y3","y1","y2","y2","y2","y2", "y1"], "action":["A","A","A","B","B","B","A","A","A"]
             })

destination table

0 个答案:

没有答案