这是我在电子标题及其评级的文件中阅读的代码,我需要阅读文件并根据评级对其进行排序。我正在使用Python代码。
这就是文件的样子:
哈利波特与阿兹卡班的囚徒,7.8 指环王:双塔,8.7 蜘蛛侠,7.3 爱丽丝梦游仙境,6.5 好恐龙,6.7 功夫熊猫,7.6
filename =("movie_ratings.txt")
def ratings_sort(array):
with open (filename) as f:
for pair in f:
title.append(pair.strip())
for index in f:
value = array[index]
i = index-1
while i>=0:
if value < array[i]:
array[i+1]=array[i]
array[i]=value
i = i-1
else:
break
title = list ()
rating = list('.')
filename =("movie_ratings.txt")
with open (filename) as f:
for pair in f:
title.append(pair.strip())
title.sort()
ratings_sort = sorted(title, key=lambda rating:rating[2])
print ("Old List :\n",title)
print('\n')
print("New List :\n" ,ratings_sort)
这些是我的结果,
旧名单: ['爱丽丝梦游仙境,6.5','哈利波特和阿兹卡班的囚徒,7.8','功夫熊猫,7.6','指环王:双塔,8.7','蜘蛛侠,7.3','好恐龙,6.7']
新名单: ['好恐龙,6.7','爱丽丝梦游仙境,6.5','蜘蛛侠,7.3','功夫熊猫,7.6','哈利波特和阿兹卡班的囚徒,7.8','指环王:双塔,8.7']
答案 0 :(得分:0)
问题是&#34; for x in file&#34;循环从文件中读取行,因此title
数组包含文件的行作为字符串。因此,key
的{{1}}参数正在接收这些字符串并返回每个字符串的第三个字符(sorted
);请注意&#34;新列表&#34;确实按第三个字符排序 - e,i,i,n,r,r。要解决此问题,您可以将文件的行解析为表单的元组(标题,评级)并将其存储在数组中。然后,按评级排序就像从rating[2]
参数中的元组到key
获取评级一样简单。
但是,在我看来,您希望自己实现排序,而不是使用内置的sorted
。看起来你要进行插入排序的实现,当你在这里发布时,缩进就搞砸了。该函数具有与解析文件行相同的问题,您需要遍历sorted
的数字索引而不是第二循环中的array
行。通过将f
右移到if
条件并仅指定比较评级的最终位置而不是交换,也可以稍微改善逻辑。
while
请注意,为了清楚起见,我重命名了一些变量并使用了namedtuple
。另外,我将文件读取移出from collections import namedtuple
def ratings_sort(movies):
for index in range(1, len(movies)):
movie = movies[index]
i = index-1
while i>=0 and movie.rating < movies[i].rating:
movies[i+1] = movies[i]
i -= 1
movies[i+1] = movie
filename = "movie_ratings.txt"
Movie = namedtuple("Movie", "title rating")
movies = list()
with open(filename) as f:
for line in f:
part = line.partition(",") # gives a tuple: ("movie title", ",", "rating)
movies.append(Movie(title=part[0].strip(), rating=float(part[2])))
print("Old List:\n", movies, "\n")
# Sort using sorted
sorted_movies = sorted(movies, key=lambda movie:movie.rating)
# Sort using ratings_sort (modifies movies array unlike sorted)
ratings_sort(movies)
print("New List (using sorted):\n", sorted_movies, "\n")
print("New List (using ratings_sort):\n", movies, "\n")
,因此我可以将其与ratings_sort
进行比较。
答案 1 :(得分:0)
让我们一步一步解决您的问题:
所以你的问题有两部分:
首先,从文件中获取正确格式的数据
根据评分对其进行排序
对于第一部分,我尝试了两种方法:
第一种方法,使用手动发电机方法,
首先让我们打开文件:
with open('dsda') as f:
data=[line.strip().split() for line in f if line!='\n'][0]
因为我需要浮动isdigit,但isdigit只支持int所以我想出这样的东西:
def isfloat(point):
try:
float(point)
return True
except ValueError:
return False
现在让我们使用生成器方法以正确的形式获取数据:
def generator_approach(data_):
storage=[]
flag=True
for word in data_:
storage.append(word)
if isfloat(word)==True:
yield storage
storage=[]
closure_ = generator_approach(data)
print(list(closure_))
输出:
[['Harry', 'Potter', 'and', 'the', 'Prisoner', 'of', 'Azkaban', ',', '7.8'], ['Lord', 'of', 'the', 'Rings:', 'The', 'Two', 'Towers', ',', '8.7'], ['Spider', 'Man', ',', '7.3'], ['Alice', 'in', 'Wonderland', ',', '6.5'], ['The', 'Good', 'Dinosaur', ',', '6.7'], ['Kung', 'Fu', 'Panda', ',', '7.6']]
现在让我们尝试第二种方法,即正则表达式方法:
import re
pattern=r'\w.+?[0-9.]+'
with open('dsda') as f:
for line in f:
data_r=[line1.split() for line1 in re.findall(pattern,line)]
输出:
[['Harry', 'Potter', 'and', 'the', 'Prisoner', 'of', 'Azkaban', ',', '7.8'], ['Lord', 'of', 'the', 'Rings:', 'The', 'Two', 'Towers', ',', '8.7'], ['Spider', 'Man', ',', '7.3'], ['Alice', 'in', 'Wonderland', ',', '6.5'], ['The', 'Good', 'Dinosaur', ',', '6.7'], ['Kung', 'Fu', 'Panda', ',', '7.6']]
正如您所看到的,两种方法的输出相同,现在根据评级对它们进行排序并不是一件大事:
print(sorted(data_r,key=lambda x:float(x[-1])))
输出:
[['Alice', 'in', 'Wonderland', ',', '6.5'], ['The', 'Good', 'Dinosaur', ',', '6.7'], ['Spider', 'Man', ',', '7.3'], ['Kung', 'Fu', 'Panda', ',', '7.6'], ['Harry', 'Potter', 'and', 'the', 'Prisoner', 'of', 'Azkaban', ',', '7.8'], ['Lord', 'of', 'the', 'Rings:', 'The', 'Two', 'Towers', ',', '8.7']]