按字符串的前几个字符对熊猫字符串列进行排序

时间:2020-09-08 21:26:48

标签: python pandas

我在数据框中有一列,其中有uuid附加了一些其他文件信息:

ff8738hjgdj792__somevar1.txt
9jldh93k4043ik__some3var.txt

我想根据第一个uuid字段对数据框进行排序(直到双下划线),而忽略其他attached string进行排序?

目前我正在这样做:

df.sort_values(by='df_column_name')

但这不会产生期望的结果,因为pd考虑了整个字符串。

我该如何用熊猫来实现这一目标?

2 个答案:

答案 0 :(得分:0)

由于您已经在使用熊猫,我建议添加pandasql。它使您轻松完成所需的工作。

import pandas as pd
import pandasql as ps

# Recreating the data you provided
df = pd.DataFrame(['ff8738hjgdj792__somevar1.txt', '9jldh93k4043ik__some3var.txt'], columns = ['something']) 

# Selecting and sorting by the the the length of the substring you're looking for
df_res = ps.sqldf("""
    select something 
    from df 
    order by substr(something, 0, length('ff8738hjgdj792')) """, locals())


print(df_res)

返回

                      something
0  9jldh93k4043ik__some3var.txt
1  ff8738hjgdj792__somevar1.txt

答案 1 :(得分:0)

Pandas 1.1.0+具有参数key。使用它来按常规python sort

进行排序

示例df

                           col1
0  ff8738hjgdj792__somevar1.txt
1  9jldh93k4043ik__some3var.txt

df['col1'].sort_values(key=lambda x: x.str.split('__').str[0])

Out[809]:
1    9jldh93k4043ik__some3var.txt
0    ff8738hjgdj792__somevar1.txt
Name: col1, dtype: object

df_final = df.sort_values(by='col1',key=lambda x: x.str.split('__').str[0])

Out[812]:
                           col1
1  9jldh93k4043ik__some3var.txt
0  ff8738hjgdj792__somevar1.txt