Question

我目前正在Postgres中编写一堆表转换语句，我想在Python中编写一个函数来减少重复代码的数量。假设我有一个表加载到Pandas中，看起来像：

import pandas as pd
df = {'state' : ['NJ', 'NJ', 'NY', 'NY'],
      'county' : ['AAA', 'BBB', 'CCC', 'DDD'],
      'population' : [100, 200, 300, 400],
      'other' : [11, 12, 13, 14],
      'row_number': [1, 2, 3, 4]
     }


   county   other   population  row_number  state
0   AAA      11         100         1       NJ
1   BBB      12         200         2       NJ
2   CCC      13         300         3       NY
3   DDD      14         400         4       NY

我想保留州和县的栏目。 other和population字段代表实际数据字段。最后，我想将这些值映射到Excel电子表格列和行。字段row_number表示与州和县对应的行号。

现在假设我有一个字典，它将两个数据字段之间的“映射”映射到列。让我们说它看起来像

column_mapping = {'other': 'A',
                  'population': 'B'
                 }

我想生成一个类似于：

的数据框

   county   state         value         row 
0   AAA       NJ          11            A1         
1   AAA       NJ          100           B1
2   BBB       NJ          12            A2
3   BBB       NJ          200           B2          
4   CCC       NY          13            A3     
5   CCC       NY          300           B3     
6   DDD       NY          14            A4
7   DDD       NY          400           B4

次要的，我试图以最一般的方式做到这一点，因为我想将几个不同的表传递给具有相似结构的这个函数，但可能有不同的列名（state，{{1 }}和county将始终相同，但实际数据字段可能不同）。

Answer 1

您可以使用melt进行重新整形，然后使用map列droplevels，将带有强制整数列的列与astype的字符串和最后drop个不必要的列合并：

variable

编辑：

如果您需要使用melt更一般，请忽略column_mapping = {'other': 'A', 'population': 'B' } df = pd.melt(df, id_vars=['county','state', 'row_number'], value_vars=['other', 'population']) df['variable'] = df['variable'].map(column_mapping) df['row'] = df['variable'] + df['row_number'].astype(str) df = df.drop(['variable','row_number'], axis=1) #if you need sort by county column with reset index df = df.sort_values('county').reset_index(drop=True) print df county state value row 0 AAA NJ 11 A1 1 AAA NJ 100 B1 2 BBB NJ 12 A2 3 BBB NJ 200 B2 4 CCC NY 13 A3 5 CCC NY 300 B3 6 DDD NY 14 A4 7 DDD NY 400 B4：

value_vars

将数据帧宽变换为长并应用映射（Python 3.5.1 Pandas）

1 个答案: