从DataFrame创建唯一编号的排序列表

时间:2016-03-12 23:40:58

标签: python sorting pandas

我正在通过LaTeX将关键字及其相应的页码写入文本文件,然后我用Python处理。如何使用相应的关键字创建已排序的页码列表?

以下代码为我提供了唯一的列表,但它没有排序。

import pandas as pd

def unique(liste):
    a = liste.split(',')
    a = [int(numeric_string) for numeric_string in a]
    a = sorted(a)
    a = map(str,a)
    b = set(a)
    return ','.join(b)

df = pd.DataFrame({'keyword': ["foo","foo","foo","foo","foo","foo","foo","foo","bar","bar","bar"], "page": [1,2,3,3,4,5,6,7,7,9,10]})
df['page'] = df['page'].astype(str)
print(df)

grouped = df.groupby('keyword',as_index=False).agg(lambda col: ','.join(col))
grouped = pd.DataFrame(grouped)
grouped['unique'] = grouped['page'].apply(unique)
print(grouped)

产生

   keyword page
0      foo    1
1      foo    2
2      foo    3
3      foo    3
4      foo    4
5      foo    5
6      foo    6
7      foo    7
8      bar    7
9      bar    9
10     bar   10
  keyword             page         unique
0     bar           7,9,10         9,7,10
1     foo  1,2,3,3,4,5,6,7  3,7,6,4,5,2,1

1 个答案:

答案 0 :(得分:1)

import numpy as np
import pandas as pd

df = pd.DataFrame(
    {'keyword': ["foo","foo","foo","foo","foo","foo","foo","foo","bar","bar","bar"], 
     "page": [1,2,3,3,4,5,6,7,7,9,10]})

# df['page'] = df['page'].astype(int)
result = df.groupby(['keyword'])['page'].agg(lambda x: ','.join(np.unique(x).astype(str)))

print(result)

产量

keyword
bar           7,9,10
foo    1,2,3,4,5,6,7
Name: page, dtype: object
  • np.unique返回唯一的排序值数组。我们希望将页面值排序为整数(而不是字符串),因此将page值保留为整数。致电np.unique后,您可以使用astype(str)转换为字符串,然后将其加入','.join