Question

我正在尝试创建一个仅包含排名高于9的电影数据的新文件。

我正在分析的数据集包含许多从IMDB获得的电影的评级。数据字段为：

Votes：为电影评分的人数
Rank：电影的平均评分
Title：电影的名称
Year：电影上映的年份

我尝试的代码：

import csv

filename = "IMDB.txt"
with open(filename, 'rt', encoding='utf-8-sig') as imdb_file:
    imdb_reader = csv.DictReader(imdb_file, delimiter = '\t')
    with open('new file.csv', 'w', newline='') as high_rank:
        fieldnames = ['Votes', 'Rank', 'Title', 'Year']
        writer = csv.DictWriter(high_rank, fieldnames=fieldnames)
        writer.writeheader()
        for line_number, current_row in enumerate (imdb_reader):
            if(float(current_row['Rank']) > 9.0):
                csv_writer.writerow(dict(current_row))

但不幸的是它不起作用，我该怎么办？

Answer 1

让我们考虑一下，您具有以下excel工作表名称<!DOCTYPE html> <html> <head> <meta content="text/html" charset="utf-8" />  <link rel="stylesheet" href="WebBox.css">  <script type="text/javascript" src="WebBox.js"></script> </head> <body>  <div id="box1"> <p>BOX ONE</p> </div> <div class="box2"> <p>BOX TWO</p> </div> </body> </html>，并且您希望过滤等级高于9（包括）的电影：

一种简单的方法是使用temp.csv模块。它使您有机会：

pandas

(doc)

读取 .csv个文件
过滤数据
导出数据到新文件：对于pd.read_csv输出，.csv完成(doc)

假设您具有以下数据框：

下面的代码可以完成这项工作：

df.to_csv

新的# import modules import pandas as pd # Path - name of your file filename = "temp.csv" # Read the csv file df = pd.read_csv(filename, sep=";") print(df) # Votes Rank Film Year # 0 15 16 The Shawshank Redemption 1994 # 1 2004 5 The Godfather 1972 # 2 486 13 The Godfather: Part II 1974 # 3 529 9 Il buono, il brutto, il cattivo. 1966 # 4 289 12 Pulp Fiction 1994 # 5 98 11 Inception 2010 # 6 69 18 Schindler's List 1993 # 7 3 7 Angry Men 1957 # 8 584 14 One Flew Over the Cuckoo's Nest 1975 # Filter the csv file df_filtered = df[df["Rank"] >= 9] print(df_filtered) # Votes Rank Film Year # 0 15 16 The Shawshank Redemption 1994 # 2 486 13 The Godfather: Part II 1974 # 3 529 9 Il buono, il brutto, il cattivo. 1966 # 4 289 12 Pulp Fiction 1994 # 5 98 11 Inception 2010 # 6 69 18 Schindler's List 1993 # 8 584 14 One Flew Over the Cuckoo's Nest 1975 # name new csv file new_filename = filename[:-3] + "_new" + filename[-3:] # Export dataframe to csv file df_filtered.to_csv(new_filename)如下所示：

Answer 2

基于your comment，您的语言环境默认编码似乎不支持整个Unicode范围。您需要为输出文件指定将处理任意Unicode字符的编码。通常，在非Windows系统上，您将使用'utf-8'；在Windows上，您可能会使用'utf-16'或'utf-8-sig'（Windows程序通常假定在语言环境编码中使用没有显式签名的UTF-8，并且会误解它）。该修补程序就像更改一样简单：

with open('new file.csv', 'w', newline='') as high_rank:

收件人：

with open('new file.csv', 'w', encoding='utf-8', newline='') as high_rank:

将指定的encoding更改为对您的操作系统和用例有意义的任何内容。

如何“排序CSV文件”的Python

2 个答案: