我正在通过LaTeX将关键字及其相应的页码写入文本文件,然后我用Python处理。如何使用相应的关键字创建已排序的页码列表?
以下代码为我提供了唯一的列表,但它没有排序。
import pandas as pd
def unique(liste):
a = liste.split(',')
a = [int(numeric_string) for numeric_string in a]
a = sorted(a)
a = map(str,a)
b = set(a)
return ','.join(b)
df = pd.DataFrame({'keyword': ["foo","foo","foo","foo","foo","foo","foo","foo","bar","bar","bar"], "page": [1,2,3,3,4,5,6,7,7,9,10]})
df['page'] = df['page'].astype(str)
print(df)
grouped = df.groupby('keyword',as_index=False).agg(lambda col: ','.join(col))
grouped = pd.DataFrame(grouped)
grouped['unique'] = grouped['page'].apply(unique)
print(grouped)
产生
keyword page
0 foo 1
1 foo 2
2 foo 3
3 foo 3
4 foo 4
5 foo 5
6 foo 6
7 foo 7
8 bar 7
9 bar 9
10 bar 10
keyword page unique
0 bar 7,9,10 9,7,10
1 foo 1,2,3,3,4,5,6,7 3,7,6,4,5,2,1
答案 0 :(得分:1)
import numpy as np
import pandas as pd
df = pd.DataFrame(
{'keyword': ["foo","foo","foo","foo","foo","foo","foo","foo","bar","bar","bar"],
"page": [1,2,3,3,4,5,6,7,7,9,10]})
# df['page'] = df['page'].astype(int)
result = df.groupby(['keyword'])['page'].agg(lambda x: ','.join(np.unique(x).astype(str)))
print(result)
产量
keyword
bar 7,9,10
foo 1,2,3,4,5,6,7
Name: page, dtype: object
np.unique
返回唯一的排序值数组。我们希望将页面值排序为整数(而不是字符串),因此将page
值保留为整数。致电np.unique
后,您可以使用astype(str)
转换为字符串,然后将其加入','.join
。