按字母升序对字母数字熊猫列进行排序(按升序排列)

时间:2020-07-14 09:20:45

标签: python python-3.x pandas dataframe sorting

我有一个包含字母数字列的数据框。我想按升序对它们进行排序:

    Answer-1    Answer0     Answer1     Answer10    Answer100   Answer101   Answer102   Answer103   Answer104   Answer105   ...     Answer98    Answer99    Answers     QID     QType   Questions   Section     Theme   Topics  URL
2649    10+     NaN     1   10  NaN     NaN     NaN     NaN     NaN     NaN     ...     NaN     NaN     ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '10']    1048    Likert Scale    How many times do you usually travel via airplane in a year     What changes would you like to see on your flight in the future? If any.    Airline XYZ     ['time', 'usual', 'travel', 'airplan', 'year']  https://docs.google.com/forms/d/1qQ28JBZE-8Mk-4wfCNfejz-_2AGKLWPUIBuzhsFE-kg/edit?usp=sharing
4155    5 or more   NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN     ...     NaN     NaN     ['012345']  2906    Likert Scale    How many flights were cancelled/affected by the global lockdown?    Media consumption   Airline XYZ     ['flight', 'cancel', 'affect', 'global', 'lockdown']    https://docs.google.com/forms/d/1yPWGOPVpk2HEj7M-2XbJDdm3EvmRozos-upH7wI9VvY/edit?usp=sharing
...

我已经尝试过一些方法,因此您可以在上面看到这些内容,但是很多似乎都存在问题:

df_merged[df_merged['QType'] == 'Likert Scale'].sort_values(by='Answer0', ascending=True)

更新

我尝试了YOLO的答案:

# create mapping of name to index
idx = dict(enumerate(df_merged.columns))

# extract digits
idx = {k: re.sub(r'\D', '', v) for k,v in idx.items()}

idx = list(dict(sorted(idx.items(), key=lambda x: int(x[1]))).keys())

df_merged = df_merged.iloc[:,idx]

但是得到了:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-874-9a2435876ebe> in <module>
      7 idx = {k: re.sub(r'\D', '', v) for k,v in idx.items()}
      8 
----> 9 idx = list(dict(sorted(idx.items(), key=lambda x: int(x[1]))).keys())
     10 
     11 df_merged = df_merged.iloc[:,idx]

<ipython-input-874-9a2435876ebe> in <lambda>(x)
      7 idx = {k: re.sub(r'\D', '', v) for k,v in idx.items()}
      8 
----> 9 idx = list(dict(sorted(idx.items(), key=lambda x: int(x[1]))).keys())
     10 
     11 df_merged = df_merged.iloc[:,idx]

ValueError: invalid literal for int() with base 10: ''

我还担心最后几列与AnswerXYZ(XYZ是数字)不同。

我也在测试这个想法:

重新导入

def convertir(nombre):
  m = re.match(r"Answer(-?\d+)", nombre)
  if m:
    return ("Answer", int(m.groups(1)[0]))
  return (nombre, 0)

nombres = ['Answer-1', 'Answer0', 'Answer1', 'Answer10', 'Answer100',
     'Answer101', 'Answer102', 'Answer103', 'Answer104', 'Answer105',
     'Answer98', 'Answer99', 'Answers', 'QID', 'QType', 'Questions',
     'Section', 'Theme', 'Topics', 'URL']

print(sorted(nombres, key=convertir))

这似乎可以完成工作,但是我不知道如何使用sorted(nombres, key=convertir)的输出对我的数据框列进行排序。

1 个答案:

答案 0 :(得分:1)

我们需要在数据框中创建一个列名到其索引的映射。您可以这样做:

# create mapping of name to index
idx = dict(enumerate(df.columns))

# extract digits
idx = {k: re.sub(r'\D', '', v) for k,v in idx.items()}

# sort digits and keep its corresponding index
idx = list(dict(sorted(idx.items(), key=lambda x: int(x[1]))).keys())

df = df.iloc[:,idx]
 
print(df.head())

    Answer0   Answer1   Answer2   Answer3   Answer4   Answer5   Answer6
0  0.652074  0.334795  0.309215  0.489695  0.011754  0.908632  0.395250   
1  0.281704  0.169817  0.683343  0.891602  0.208878  0.029028  0.519839   
2  0.983723  0.067707  0.053501  0.712321  0.224386  0.609682  0.323190   
3  0.557681  0.484641  0.053048  0.134786  0.609206  0.378064  0.540113   
4  0.031538  0.675454  0.556284  0.384275  0.731091  0.298495  0.952463

样本数据

import numpy as np

cols = [f"Answer{x}" for x in range(20)]
np.random.shuffle(cols)
df = pd.DataFrame(np.random.random((10, 20)), columns=cols)