我需要从post gres服务器读取数据并将其放入数组/数据中。每行都有一个源字段和一个目标字段。我需要将它们累计添加到数组中。当我遍历数据框时,如果的源字段和目标字段不在account列中,则需要将它们添加到其中。
这是我的代码当前的样子(为简洁起见,不包括postgres部分)
# Load the data
data = pd.read_sql(sql_command, conn)
# taking a subet of the data until algorithm is perfected.
seed = np.random.seed(42)
n = data.shape[0]
ix = np.random.choice(n,10000)
df_tmp = data.iloc[ix]
# Taking the source and destination and combining it into a list in another column
df_tmp['accounts'] = df_tmp.apply(lambda x: [x['source'], x['destination']], axis=1)
# Attempt at cummulatively adding accounts to columns
for index, row in df_tmp.iterrows():
if 'accounts' not in df_tmp:
df_tmp['accounts'] = df_tmp.apply(lambda x: [x['accounts'], x['source'],x['destination']], axis=1)
else:
df_tmp['accounts'] = df_tmp['accounts']
问题:
答案 0 :(得分:1)
您可以在cumsum
列上使用accounts
来创建帐户值的累积串联。然后将累积列表转换为Set
,以保留唯一值。
这里有一个类似的问题:Cumulative Set in PANDAS
df_tmp['accounts_acc'] = df_tmp['accounts'].cumsum().apply(set)