Question

我有2个数据框，每个数据框都有dtype int64的列＆＃34; count＆＃34; 和索引＆＃34; product_id＆＃34; ，所以我想要在数据框的索引之间使用手工制作的公式，用于＆＃34; count＆＃34;列。我知道我可以像数据框＆＃34;减法＆＃34; 那样做，但是culdn找不到如何在数据框的列之间使用手工制作的功能。顺便说一下，行和索引的数量不完全匹配。我只需要使用相同的索引函数。

以下是两个数据帧的示例

df2_count[['count']].head()

    count
product_id  
    9014    41
    8458    11
    55522   9
    6969    8
    8840    7


df1_count[['count']].head()

        count
product_id  
    7545    12
    8866    10
    8867    10
    47196   6
    9014    5

这就是我试图做的事情。当我没有找到如何做我需要的时候 - ＆gt;我试图创建NaN样本df，其中行和列分别是数据帧索引。然后迭代每一列的每一行并按函数的结果填充NaN样本数据框但是看起来很乱，很多NaN我甚至不知道如何处理并让它看起来正常阅读。

data_ibs = pd.DataFrame(index=df2_count.index,columns=df1_count.index)

def formula(a, b):
    if a > b:
        ans_inc = (a-b) / b * 100
        return ans_inc
    else: 
        ans_decr = (a-b) / a * 100
        return ans_decr

for i in range(0,len(df2_count.index)):
    for j in range(0,len(df1_count.index)):
        if df2_count.index[i] == df1_count.index[j]:
            a = df2_count.get_value(df2_count.index[i], 'count')
            b = df1_count.get_value(df1_count.index[j], 'count')
            data_ibs.ix[i,j] = formula(a, b)

output_csv = data_ibs.to_csv('output.csv')

愿有人帮助我，我怎样才能更轻松，更实施我需要的东西＆＃34; pandasly＆＃34;？谢谢你的帮助

Answer 1

我只是以更优雅的方式（熊猫方式）做到了。我们的想法是不尝试在不同的数据帧之间应用func，而是将其合并为一个，然后使用简单的pandas in-build 应用函数计算您需要的内容

dff = pd.merge(df2_count, df1_count, how='outer', \
                    right_index=True, left_index=True, suffixes=('_x', '_y')).fillna(1) 
dff['mean'] = dff[['count_x', 'count_y']].mean(axis=1)
dff['sum'] = dff[['count_x', 'count_y']].sum(axis=1)
dff['count_percents'] = dff.apply(lambda row: change_percents(row['count_x'], row['count_y']), axis=1)

顺便说一句，您可以从数据框列表中创建一个数据框。刚刚附上我使用的代码 - ＆gt;也可以提供帮助。

frames = []
for filename in os.listdir(path):
    if not filename.endswith('csv'): 
        continue
    logging.debug(filename)
    df = pd.read_csv(os.path.join(path, filename), index_col=None, names=['wallets'])
    frames.append(df)
    logging.debug(frames)

希望，它会帮助某人：）

在pandas

1 个答案: