我该如何对这些数据进行排序?

时间:2014-01-10 15:56:34

标签: python sorting file-management

所以,我正在开展一个项目,在这个项目中,我必须对一个充满歌曲数据的大型34mb文本文件进行排序。文本文件的每一行都有一年,唯一编号,艺术家和歌曲。我无法弄清楚的是如何有效地将数据排序到其他文本文件中。我想按艺术家姓名和歌曲名称排序。可悲的是,这就是我所拥有的一切:

#Opening the file to read here
with open('tracks_per_year.txt', 'r',encoding='utf8') as in_file:
#Creating 'lists' to put information from array into
years=[]
uics=[]
artists=[]
songs=[]

#Filling up the 'lists'
for line in in_file:
    year,uic,artist,song=line.split("<SEP>")
    years.append(year)
    uics.append(uic)
    artists.append(artist)
    songs.append(song)
    print(year)
    print(uic)
    print(artist)
    print(song)

#Sorting:
with open('artistsort.txt', 'w',encoding='utf8') as artist:

for x in range(1,515576):

    if artists[x]==artists[x-1]:
        artist.write (years[x])
        artist.write(" ")
        artist.write(uics[x])
        artist.write(" ")
        artist.write(artists[x])
        artist.write(" ")
        artist.write(songs[x])
        artist.write("\n")


with open('Onehitwonders.txt','w',encoding='utf8') as ohw:

for x in range(1,515576):

    if artists[x]!= artists[x-1]:
        ohw.write (years[x])
        ohw.write(" ")
        ohw.write(uics[x])
        ohw.write(" ")
        ohw.write(artists[x])
        ohw.write(" ")
        ohw.write(songs[x])
        ohw.write("\n") 

请记住我是新手,所以请尽量用简单的方式解释。如果你们有任何其他想法我也很想听。谢谢!

3 个答案:

答案 0 :(得分:0)

您可以将数据导入基于字典的结构,即每个艺术家和歌曲:

data = {artist_name: {song_name: {'year': year, 'uid': uid}, 
                      ... }, 
        ...}

然后在输出时,使用sorted按字母顺序获取它们:

for artist in sorted(data):
    for song in sorted(data[artist]):
        # use data[artist][song] to access details

答案 1 :(得分:0)

请尝试这样的事情:

from operator import attrgetter

class Song:
    def __init__(self, year, uic, artist, song):
        self.year = year
        self.uic = uic
        self.artist = artist
        self.song = song

songs = []

with open('tracks_per_year.txt', 'r', encoding='utf8') as in_file:
    for line in in_file:
        year, uic, artist, song = line.split("<SEP>")
        songs.append(Song(year, uic, artist, song))
        print(year)
        print(uic)
        print(artist)
        print(song)

with open('artistsort.txt', 'w', encoding='utf8') as artist:
    for song in sorted(songs, key=attrgetter('artist', 'song')):
        artist.write (song.year)
        artist.write(" ")
        artist.write(song.uic)
        artist.write(" ")
        artist.write(song.artist)
        artist.write(" ")
        artist.write(song.song)
        artist.write("\n")

答案 2 :(得分:0)

你不能超越pandas的简单性。要阅读您的文件:

import pandas as pd

data = pd.read_csv('tracks_per_year.txt', sep='<SEP>')
data
#    year    uic     artist      song
#0   1981    uic1    artist1     song1
#1   1934    uic2    artist2     song2
#2   2004    uic3    artist3     song3

然后按特定列排序并写入新文件只需执行:

data.sort(columns='year').to_csv('year_sort.txt')