我的方法在这里是用Python做的正确方法吗?由于我是Python的新手,我感谢您提供的任何反馈,特别是如果我离开这里。
我的任务是根据数据集中的值排序文件名列表。具体来说,这些是我需要根据站点信息进行排序的文件名。结果列表是报告的打印顺序。
网站信息
key_info = pd.DataFrame({
'key_id': ['1010','3030','2020','5050','4040','4040']
, 'key_name': ['Name_A','Name_B','Name_C','Name_D','Name_E','Name_E']
, 'key_value': [1,2,3,4,5,6]
})
key_info = key_info[['key_id','key_name']].drop_duplicates()
key_info['key_id'] = key_info.key_id.astype('str').astype('int64')
文件名
这些是我需要排序的文件名。在这个例子中,我只按key_id排序,但我假设我可以轻松地为站点信息添加一列,并按此排序。
filenames = ['1010_Filename','2020_Filename','3030_Filename','5050_Filename','4040_Filename']
分拣
生成的“文件名”是最终的排序列表。
names_df = pd.DataFrame({'filename': filenames})
names_df['key_id'] = names_df.filename.str[:4].astype('str').astype('int64')
merged_df = pd.merge(key_info, names_df, on='key_id', how='right')
merged_df = merged_df.sort_values('key_id')
filenames = merged_df['filename'].tolist()
我正在寻找可能更好或更好Pythonic的任何解决方案。或者,如果有更合适的地方发布“代码审查”问题。
答案 0 :(得分:0)
我喜欢你使用Pandas,但它并不是Pythonic,因为它使用的是Python的超集数据结构。尽管如此,我认为我们可以改进你拥有的东西。我将展示一个改进的版本,我将展示一种完全原生的Python方式来实现它。我猜哪个都好?
严格的Python版本最适合那些了解熊猫的人,因为有大量的学习曲线。
<强>公用强>
对于这两个例子,让我们假设一个这样的函数:
def trim_filenames(filename):
return filename[0:4]
我在两个例子中都使用了这个。
<强>改进强>
# Load the DataFrame and give it a proper index (I added some data)
key_info = pd.DataFrame(index=['2020','5050','4040','4040','6000','7000','1010','3030'], data={'key_name':['Name_C','Name_D','Name_E','Name_E','Name_F','Name_G','Name_A','Name_B'], 'key_value' :[1,2,3,4,5,6,7,8]})
# Eliminate duplicates and sort in one step
key_info = key_info.groupby(key_info.index).first()
filenames = ['1010_Filename','2020_Filename','3030_Filename','5050_Filename','4040_Filename']
names_df = pd.DataFrame({'filename': filenames})
# Let's give this an index too so we can match on the index (not the function call)
names_df.index=names_df.filename.transform(trim_filenames)
combined = pd.concat([key_info,names_df], axis=1)
按索引组合匹配,但有些键没有文件名。它现在看起来像这样:
key_name key_value filename
1010 Name_A 7 1010_Filename
2020 Name_C 1 2020_Filename
3030 Name_B 8 3030_Filename
4040 Name_E 3 4040_Filename
5050 Name_D 2 5050_Filename
6000 Name_F 5 NaN
7000 Name_G 6 NaN
现在我们删除NaN列并创建文件名列表:
combined.filename.dropna().values.tolist()
['1010_Filename', '2020_Filename', '3030_Filename', '4040_Filename', '5050_Filename']
仅限Python版本(无框架)
key_info = {'2020' : {'key_name':'Name_C', 'key_value':1},'5050' : {'key_name':'Name_D', 'key_value':2},'4040' : {'key_name':'Name_E', 'key_value':3},'4040' : {'key_name':'Name_E', 'key_value':4},'6000' : {'key_name':'Name_F', 'key_value':5},'7000' : {'key_name':'Name_G', 'key_value':6},'1010' : {'key_name':'Name_A', 'key_value':7},'3030' : {'key_name':'Name_B', 'key_value':8}}
filenames = ['1010_Filename','2020_Filename','3030_Filename','5050_Filename','4040_Filename']
# Let's get a dictionary of filenames that is keyed by the same key as in key_info:
hashed_filenames = {}
for filename in filenames:
# Note here I'm using the function again
hashed_filenames[trim_filenames(filename)] = filename
# We'll store the new filenames in new_filenames:
new_filenames = []
# sort the key info and loop it
for key in sorted(key_info.keys()):
# for each key, if the key matches in the hashed_filenames, then add it to the list
if key in hashed_filenames:
new_filenames.append(hashed_filenames[key])
<强>摘要强>
这两个解决方案都很简洁,我喜欢Pandas,但我更喜欢那些知道Python的人可以立即阅读的东西。在我看来,Python唯一的解决方案(当然,它们都是Python)是你应该使用的解决方案。
答案 1 :(得分:0)
out_list = []
for x in key_info.key_id:
for f in filenames:
if str(x) in f:
out_list.append(f)
out_list
['1010_Filename', '3030_Filename', '2020_Filename', '5050_Filename', '4040_Filename']