Question

我有两个数据框DFa和DFb。 DFa包含4列：Date，macro_A，macro_B，macro_C。鉴于DFb包含3列：Name，Region，Transformation。

我想要实现的是，我希望检查DFa中的列名是否包含在DFb.Name中;如果是，那么我将从Transformation中提取等效的DFb方法。根据转换方法的不同，我将适当地转换DFa列。

DFa = pd.DataFrame({'Date' : [2010, 2011, 2012, 2013],
'macro_A' : [0.23, 0.20, 0.13, 0.19], 
'macro_B' : [0.23, 0.20, 0.13, 0.19], 
'macro_C' : [0.23, 0.20, 0.13, 0.19]}, index = [1, 2, 3, 4])

DFb = pd.DataFrame({'Name' : ['macro_C', 'macro_B', 'macro_D', 'macro_A', 'macro_E'],
'Region' : ['UK', 'UK', 'US', 'UK', 'EUR'], 
'Transformation' : ['non', 'STD', 'STD', 'STD', 'non']}, 
 index = [1, 2, 3, 4, 5])

例如，我检查macro_A中DFa列DFb.Name列是否存在DFb.Transformation。然后，我检查STD的值是DFa.macro_A，这意味着我需要转换（标准化）macro_C。

另一方面，我检查DFa中的DFb.Name DFb.Transformation，macro_C non DFa.macro_C是for j, k in enumerate(DFa.columns): for i, x in enumerate(DFb['Name']): if x == k: if DFb.ix[i, 'Transformation'] == 'STD': DFa.iloc[:, j] = preprocessing.scale(DFa.iloc[: j])。因此，我保持原样export const MyClass = ...。

我已经构建了这段代码

 import * as MC from './src/views/MyClass.js

如何让我的代码更有效率？

Answer 1

遵循更正的代码：

from sklearn import preprocessing
min_max_scaler = preprocessing.MinMaxScaler()
for j, k in enumerate(DFa.columns):
    for i, x in enumerate(DFb.Name):
        if x == k and DFb.iloc[i,:]['Transformation'] == 'STD':
            DFa.iloc[:,j] = min_max_scaler.fit_transform(DFa.iloc[:,j])

print(DFa)

输出：

...some DEPRECATION_MSG warnings...
   Date  macro_A  macro_B  macro_C
1  2010      1.0      1.0     0.23
2  2011      0.7      0.7     0.20
3  2012      0.0      0.0     0.13
4  2013      0.6      0.6     0.19

macro_A和macro_B已经缩放但不是macro_C。

Answer 2

我认为您可以使用列名来避开enumerate和iloc。我还建议使用string->lambda映射来存储操作并使用apply函数。当你有多个操作字符串时它会有所帮助

operations = {'STD': lambda x : min_max_scaler.fit_transform(x),
              'non': lambda x : x} # Operations map 

for colName in DFa.columns.values:
    transformStr = DFb.Transformation[DFb.Name == colName] #Get the transform string by matching column name with Name column

    if transformStr.shape[0] > 1 or transformStr.shape[0] == 0: # Make sure that only one operation is selected
        raise(Exception('Invalid transform string %s',transformStr))

    DFa[colName] = DFa[colName].apply(operations[transformStr.iloc[0]])

Python：如果列包含字符串，则提取另一列的值

2 个答案: