我有一个包含字母数字列的数据框。我想按升序对它们进行排序:
Answer-1 Answer0 Answer1 Answer10 Answer100 Answer101 Answer102 Answer103 Answer104 Answer105 ... Answer98 Answer99 Answers QID QType Questions Section Theme Topics URL
2649 10+ NaN 1 10 NaN NaN NaN NaN NaN NaN ... NaN NaN ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '10'] 1048 Likert Scale How many times do you usually travel via airplane in a year What changes would you like to see on your flight in the future? If any. Airline XYZ ['time', 'usual', 'travel', 'airplan', 'year'] https://docs.google.com/forms/d/1qQ28JBZE-8Mk-4wfCNfejz-_2AGKLWPUIBuzhsFE-kg/edit?usp=sharing
4155 5 or more NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN ['012345'] 2906 Likert Scale How many flights were cancelled/affected by the global lockdown? Media consumption Airline XYZ ['flight', 'cancel', 'affect', 'global', 'lockdown'] https://docs.google.com/forms/d/1yPWGOPVpk2HEj7M-2XbJDdm3EvmRozos-upH7wI9VvY/edit?usp=sharing
...
我已经尝试过一些方法,因此您可以在上面看到这些内容,但是很多似乎都存在问题:
df_merged[df_merged['QType'] == 'Likert Scale'].sort_values(by='Answer0', ascending=True)
我尝试了YOLO的答案:
# create mapping of name to index
idx = dict(enumerate(df_merged.columns))
# extract digits
idx = {k: re.sub(r'\D', '', v) for k,v in idx.items()}
idx = list(dict(sorted(idx.items(), key=lambda x: int(x[1]))).keys())
df_merged = df_merged.iloc[:,idx]
但是得到了:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-874-9a2435876ebe> in <module>
7 idx = {k: re.sub(r'\D', '', v) for k,v in idx.items()}
8
----> 9 idx = list(dict(sorted(idx.items(), key=lambda x: int(x[1]))).keys())
10
11 df_merged = df_merged.iloc[:,idx]
<ipython-input-874-9a2435876ebe> in <lambda>(x)
7 idx = {k: re.sub(r'\D', '', v) for k,v in idx.items()}
8
----> 9 idx = list(dict(sorted(idx.items(), key=lambda x: int(x[1]))).keys())
10
11 df_merged = df_merged.iloc[:,idx]
ValueError: invalid literal for int() with base 10: ''
我还担心最后几列与AnswerXYZ(XYZ是数字)不同。
我也在测试这个想法:
重新导入
def convertir(nombre):
m = re.match(r"Answer(-?\d+)", nombre)
if m:
return ("Answer", int(m.groups(1)[0]))
return (nombre, 0)
nombres = ['Answer-1', 'Answer0', 'Answer1', 'Answer10', 'Answer100',
'Answer101', 'Answer102', 'Answer103', 'Answer104', 'Answer105',
'Answer98', 'Answer99', 'Answers', 'QID', 'QType', 'Questions',
'Section', 'Theme', 'Topics', 'URL']
print(sorted(nombres, key=convertir))
这似乎可以完成工作,但是我不知道如何使用sorted(nombres, key=convertir)
的输出对我的数据框列进行排序。
答案 0 :(得分:1)
我们需要在数据框中创建一个列名到其索引的映射。您可以这样做:
# create mapping of name to index
idx = dict(enumerate(df.columns))
# extract digits
idx = {k: re.sub(r'\D', '', v) for k,v in idx.items()}
# sort digits and keep its corresponding index
idx = list(dict(sorted(idx.items(), key=lambda x: int(x[1]))).keys())
df = df.iloc[:,idx]
print(df.head())
Answer0 Answer1 Answer2 Answer3 Answer4 Answer5 Answer6
0 0.652074 0.334795 0.309215 0.489695 0.011754 0.908632 0.395250
1 0.281704 0.169817 0.683343 0.891602 0.208878 0.029028 0.519839
2 0.983723 0.067707 0.053501 0.712321 0.224386 0.609682 0.323190
3 0.557681 0.484641 0.053048 0.134786 0.609206 0.378064 0.540113
4 0.031538 0.675454 0.556284 0.384275 0.731091 0.298495 0.952463
样本数据
import numpy as np
cols = [f"Answer{x}" for x in range(20)]
np.random.shuffle(cols)
df = pd.DataFrame(np.random.random((10, 20)), columns=cols)