我正在尝试创建一个仅包含排名高于9的电影数据的新文件。
我正在分析的数据集包含许多从IMDB获得的电影的评级。 数据字段为:
Votes
:为电影评分的人数Rank
:电影的平均评分Title
:电影的名称Year
:电影上映的年份我尝试的代码:
import csv
filename = "IMDB.txt"
with open(filename, 'rt', encoding='utf-8-sig') as imdb_file:
imdb_reader = csv.DictReader(imdb_file, delimiter = '\t')
with open('new file.csv', 'w', newline='') as high_rank:
fieldnames = ['Votes', 'Rank', 'Title', 'Year']
writer = csv.DictWriter(high_rank, fieldnames=fieldnames)
writer.writeheader()
for line_number, current_row in enumerate (imdb_reader):
if(float(current_row['Rank']) > 9.0):
csv_writer.writerow(dict(current_row))
但不幸的是它不起作用,我该怎么办?
答案 0 :(得分:0)
让我们考虑一下,您具有以下excel工作表名称<!DOCTYPE html>
<html>
<head>
<meta content="text/html" charset="utf-8" />
<!-- Link to my CSS file -->
<link rel="stylesheet" href="WebBox.css">
<!-- Link to my JS file -->
<script type="text/javascript" src="WebBox.js"></script>
</head>
<body>
<!-- this will be the first box -->
<div id="box1">
<p>BOX ONE</p>
</div>
<div class="box2">
<p>BOX TWO</p>
</div>
</body>
</html>
,并且您希望过滤等级高于9(包括)的电影:
一种简单的方法是使用temp.csv
模块。它使您有机会:
假设您具有以下数据框:
下面的代码可以完成这项工作:
df.to_csv
新的# import modules
import pandas as pd
# Path - name of your file
filename = "temp.csv"
# Read the csv file
df = pd.read_csv(filename, sep=";")
print(df)
# Votes Rank Film Year
# 0 15 16 The Shawshank Redemption 1994
# 1 2004 5 The Godfather 1972
# 2 486 13 The Godfather: Part II 1974
# 3 529 9 Il buono, il brutto, il cattivo. 1966
# 4 289 12 Pulp Fiction 1994
# 5 98 11 Inception 2010
# 6 69 18 Schindler's List 1993
# 7 3 7 Angry Men 1957
# 8 584 14 One Flew Over the Cuckoo's Nest 1975
# Filter the csv file
df_filtered = df[df["Rank"] >= 9]
print(df_filtered)
# Votes Rank Film Year
# 0 15 16 The Shawshank Redemption 1994
# 2 486 13 The Godfather: Part II 1974
# 3 529 9 Il buono, il brutto, il cattivo. 1966
# 4 289 12 Pulp Fiction 1994
# 5 98 11 Inception 2010
# 6 69 18 Schindler's List 1993
# 8 584 14 One Flew Over the Cuckoo's Nest 1975
# name new csv file
new_filename = filename[:-3] + "_new" + filename[-3:]
# Export dataframe to csv file
df_filtered.to_csv(new_filename)
如下所示:
答案 1 :(得分:0)
基于your comment,您的语言环境默认编码似乎不支持整个Unicode范围。您需要为输出文件指定将处理任意Unicode字符的编码。通常,在非Windows系统上,您将使用'utf-8'
;在Windows上,您可能会使用'utf-16'
或'utf-8-sig'
(Windows程序通常假定在语言环境编码中使用没有显式签名的UTF-8,并且会误解它)。该修补程序就像更改一样简单:
with open('new file.csv', 'w', newline='') as high_rank:
收件人:
with open('new file.csv', 'w', encoding='utf-8', newline='') as high_rank:
将指定的encoding
更改为对您的操作系统和用例有意义的任何内容。