如何“排序CSV文件”的Python

时间:2019-05-29 16:32:32

标签: python csv export-to-csv

我正在尝试创建一个仅包含排名高于9的电影数据的新文件。

我正在分析的数据集包含许多从IMDB获得的电影的评级。 数据字段为:

  • Votes:为电影评分的人数
  • Rank:电影的平均评分
  • Title:电影的名称
  • Year:电影上映的年份

我尝试的代码:

import csv

filename = "IMDB.txt"
with open(filename, 'rt', encoding='utf-8-sig') as imdb_file:
    imdb_reader = csv.DictReader(imdb_file, delimiter = '\t')
    with open('new file.csv', 'w', newline='') as high_rank:
        fieldnames = ['Votes', 'Rank', 'Title', 'Year']
        writer = csv.DictWriter(high_rank, fieldnames=fieldnames)
        writer.writeheader()
        for line_number, current_row in enumerate (imdb_reader):
            if(float(current_row['Rank']) > 9.0):
                csv_writer.writerow(dict(current_row))

但不幸的是它不起作用,我该怎么办?

2 个答案:

答案 0 :(得分:0)

让我们考虑一下,您具有以下excel工作表名称<!DOCTYPE html> <html> <head> <meta content="text/html" charset="utf-8" /> <!-- Link to my CSS file --> <link rel="stylesheet" href="WebBox.css"> <!-- Link to my JS file --> <script type="text/javascript" src="WebBox.js"></script> </head> <body> <!-- this will be the first box --> <div id="box1"> <p>BOX ONE</p> </div> <div class="box2"> <p>BOX TWO</p> </div> </body> </html> ,并且您希望过滤等级高于9(包括)的电影:

一种简单的方法是使用temp.csv模块。它使您有机会:

    使用pandas方法(doc)
  • 读取 .csv个文件
  • 过滤数据
  • 导出数据到新文件:对于pd.read_csv输出,.csv完成(doc)

假设您具有以下数据框:

enter image description here

下面的代码可以完成这项工作:

df.to_csv

新的# import modules import pandas as pd # Path - name of your file filename = "temp.csv" # Read the csv file df = pd.read_csv(filename, sep=";") print(df) # Votes Rank Film Year # 0 15 16 The Shawshank Redemption 1994 # 1 2004 5 The Godfather 1972 # 2 486 13 The Godfather: Part II 1974 # 3 529 9 Il buono, il brutto, il cattivo. 1966 # 4 289 12 Pulp Fiction 1994 # 5 98 11 Inception 2010 # 6 69 18 Schindler's List 1993 # 7 3 7 Angry Men 1957 # 8 584 14 One Flew Over the Cuckoo's Nest 1975 # Filter the csv file df_filtered = df[df["Rank"] >= 9] print(df_filtered) # Votes Rank Film Year # 0 15 16 The Shawshank Redemption 1994 # 2 486 13 The Godfather: Part II 1974 # 3 529 9 Il buono, il brutto, il cattivo. 1966 # 4 289 12 Pulp Fiction 1994 # 5 98 11 Inception 2010 # 6 69 18 Schindler's List 1993 # 8 584 14 One Flew Over the Cuckoo's Nest 1975 # name new csv file new_filename = filename[:-3] + "_new" + filename[-3:] # Export dataframe to csv file df_filtered.to_csv(new_filename) 如下所示:

enter image description here

答案 1 :(得分:0)

基于your comment,您的语言环境默认编码似乎不支持整个Unicode范围。您需要为输出文件指定将处理任意Unicode字符的编码。通常,在非Windows系统上,您将使用'utf-8';在Windows上,您可能会使用'utf-16''utf-8-sig'(Windows程序通常假定在语言环境编码中使用没有显式签名的UTF-8,并且会误解它)。该修补程序就像更改一样简单:

with open('new file.csv', 'w', newline='') as high_rank:

收件人:

with open('new file.csv', 'w', encoding='utf-8', newline='') as high_rank:

将指定的encoding更改为对您的操作系统和用例有意义的任何内容。