Question

我在几个月内从人们那里收集了一些数据（42个特征）（最多6个; 因不同的条目而异），每个月的值都在自己的行中表示：

在df中有9267个唯一ID值（设置为索引）和多达50 000行。我想将它转换为每个ID的42 * 6特征向量（即使有些人会有很多NaN），所以我可以对它们进行训练，这里应该是这样的：

这是我的解决方案：

def flatten_features(f_matrix, ID):
    '''constructs a 1x(6*n) vector from  6xn matrix'''
    #check wether it is a series, not dataframe
    if(len(f_matrix.shape) == 1): 
        f_matrix['ID'] = ID
        return f_matrix

    flattened_vector = f_matrix.iloc[0]

    for i in range(1, f_matrix.shape[0]):
        vector_append = f_matrix.iloc[i]
        vector_append.index = (lambda month, series_names : series_names.map(lambda name : name + '_' + str(month)))\
                                (i, vector_append.index)
        flattened_vector = flattened_vector.append(vector_append)

    flattened_vector['ID'] = ID
    return flattened_vector


#construct dataframe of flattened vectors for numerical features
new_indices = flatten_features(numerical_f.iloc[:6], 1).index
new_indices

flattened_num_f = pd.DataFrame(columns=new_indices)
flattened_num_f

for label in numerical_f.index.unique():

    matr = numerical_f.loc[label]
    flattened_num_f = flattened_num_f.append(flatten_features(matr, label))

它会产生所需的结果，但运行速度很慢。我想知道，有更优雅，更快速的解决方案吗？

Answer 1

如果要转置df，可以使用CAM功能。我假设你有id存储在unique_id变量

中

var object1 = { color: 'red', length: 1, width: 6 },
    object2 = { color: 'blue', length: 4, width: 2 },
    object3 = { color: 'green', length: 4, width: 5 };

['color', 'length', 'width'].forEach(function (k) {
    [object1, object2, object3].forEach(function (o) {
        console.log(o[k]);
    });
});

通过追加为多个索引值构造一行

1 个答案: