我尝试将输出文件写为CSV文件但是得到错误或不是预期的结果。我也在使用Python 3.5.2和2.7。
在Python 3.5中获取错误:
wr.writerow(var)
TypeError: a bytes-like object is required, not 'str'
和
在Python 2.7中,我将所有列结果都放在一列中。
预期结果:
输出文件格式与输入文件格式相同。
代码:
import csv
f1 = open("input_1.csv", "r")
resultFile = open("out.csv", "wb")
wr = csv.writer(resultFile, quotechar=',')
def sort_duplicates(f1):
for i in range(0, len(f1)):
f1.insert(f1.index(f1[i])+1, f1[i])
f1.pop(i+1)
for var in f1:
#print (var)
wr.writerow([var])
如果我使用resultFile = open("out.csv", "w")
,我会在输出文件中添加一行。
如果我使用上面的代码,请额外添加一行和一行。
答案 0 :(得分:3)
在Python 3上,csv
要求以文本模式打开文件,而不是二进制模式。从文件模式中删除b
。你应该真的使用newline=''
:
resultFile = open("out.csv", "w", newline='')
更好的是,使用文件对象作为上下文管理器,以确保它自动关闭:
with open("input_1.csv", "r") as f1, \
open("out.csv", "w", newline='') as resultFile:
wr = csv.writer(resultFile, dialect='excel')
for var in f1:
wr.writerow([var.rstrip('\n')])
我还剥离1>来自f1
的行(只是为了删除换行符)并将该行放入列表中; csv.writer.writerow
想要一个包含列的序列,而不是一个字符串。
如果 csvfile 是文件对象,则应使用
newline=''
[1]打开它。 [...] 所有其他非字符串数据在写入之前都使用str()
进行字符串化。[1]如果未指定
newline=''
,则引用字段中嵌入的换行符将无法正确解释,并且在写入时使用\r\n
换行符的平台上将添加额外的\r
。指定newline=''
应始终是安全的,因为csv模块会执行自己的(universal)换行处理。
答案 1 :(得分:2)
其他人已回答您在使用Python 3时应以文本模式打开输出文件,即
deerfield <- get_ida("01170000", "1990-10-01", "2007-09-30")
dplyr::glimpse(deerfield)
## Observations: 550,917
## Variables: 8
## $ site_no <chr> "01170000", "01170000", "01170000", "01170000", "0117000...
## $ date_time <time> 1990-10-01 00:15:00, 1990-10-01 00:30:00, 1990-10-01 00...
## $ tz_cd <chr> "EDT", "EDT", "EDT", "EDT", "EDT", "EDT", "EDT", "EDT", ...
## $ dd <dbl> 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7,...
## $ accuracy_cd <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,...
## $ value <dbl> 146, 139, 135, 143, 154, 166, 171, 175, 171, 166, 162, 1...
## $ prec <dbl> 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,...
## $ remark <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
head(deerfield)
## # A tibble: 6 x 8
## site_no date_time tz_cd dd accuracy_cd value prec remark
## <chr> <time> <chr> <dbl> <dbl> <dbl> <dbl> <chr>
## 1 01170000 1990-10-01 00:15:00 EDT 7 1 146 3 <NA>
## 2 01170000 1990-10-01 00:30:00 EDT 7 1 139 3 <NA>
## 3 01170000 1990-10-01 00:45:00 EDT 7 1 135 3 <NA>
## 4 01170000 1990-10-01 01:00:00 EDT 7 1 143 3 <NA>
## 5 01170000 1990-10-01 01:15:00 EDT 7 1 154 3 <NA>
## 6 01170000 1990-10-01 01:30:00 EDT 7 1 166 3 <NA>
cat(comment(deerfield))
# retrieved: 2016-09-12 05:32:34 CST
#
# Data for the following station is contained in this file
# ---------------------------------------------------------
# USGS 01170000 DEERFIELD RIVER NEAR WEST DEERFIELD, MA
#
# This data file was retrieved from the USGS
# instantaneous data archive at
# http://ida.water.usgs.gov
#
# ---------------------WARNING---------------------
# The instantaneous data you have obtained from
# this automated U.S. Geological Survey database
# may or may not have been the basis for the published
# daily mean discharges for this station. Although
# automated filtering has been used to compare these
# data to the published daily mean values and to remove
# obviously bad data, there may still be significant
# error in individual values. Users are strongly
# encouraged to review all data carefully prior to use.
# These data are released on the condition that neither
# the USGS nor the United States Government may be held
# liable for any damages resulting from its use.
#
# This file consists of tab-separated columns of the
# following fields.
#
# column column definition
# ----------- -----------------------------------------
# site_no USGS site identification number
# date_time date and time in format (YYYYMMDDhhmmss)
# tz_cd time zone
# dd internal USGS sensor designation (''data descriptor'')
# accuracy_cd accuracy code
# 0 - A daily mean discharge calculated from the instantaneous
# data on this day is 0.01 cubic feet per second
# or less and the published daily mean is zero.
# 1 - A daily mean discharge calculated from the instantaneous
# data on this day matches the published daily mean
# within 1 percent.
# 2 - A daily mean discharge calculated from the instantaneous
# data on this day matches the published daily mean
# from greater than 1 to 5 percent.
# 3 - A daily mean discharge calculated from the instantaneous
# values on this day matches the published daily mean
# from greater than 5 to 10 percent.
# 9 - The instantaneous value is considered correct by the
# collecting USGS Water Science Center. A published daily
# mean value does not exist and/or no comparison was made.
# value discharge in cubic feet per second
# precision digits of precision in the discharge
# remark optional remark code
# Remark Explanation
# < Actual value is known to be less than reported value.
# > Actual value is known to be greater than reported value.
# & Value is affected by unspecified reasons.
# A Value is affected by ice at the measurement site.
# B Value is affected by backwater at the measurement site.
# e Value has been estimated by USGS personnel.
# E Value was computed from an estimated value.
# F Value was modified due to automated filtering.
# K Value is affected by instrument calibration drift.
# R Rating is undefined for this value.
#
#
但您还需要解析传入的CSV数据。因为它是您的代码将输入CSV文件的每一行读取为单个字符串。然后,在不将该行拆分为其组成字段的情况下,它将字符串传递给CSV编写器。因此,with open('out.csv', 'w', newline='') as resultFile:
...
会将字符串视为序列,并将每个字符(包括任何终止的新行字符)输出为单独的字段。例如,如果您的输入CSV文件包含:
1,2,3,4
您的输出文件将按如下方式编写:
1,",",2,",",3,",",4," "
您应该将csv.writer
循环更改为:
for
现在输入的CSV文件将被解析为字段,for row in csv.reader(f1):
# process the row
wr.writerow(row)
将包含字符串列表 - 每个字段一个。对于前面的示例,row
将是:
row
['1', '2', '3', '4']
当该列表传递给for row in csv.reader(f1):
print(row)
时,文件的输出将为:
1,2,3,4
将所有这些放在一起就可以得到这段代码:
csv.writer
答案 2 :(得分:0)
打开没有b模式的文件
b模式将文件打开为二进制文件
您可以将文件打开为
open_file = open("filename.csv", "w")
答案 3 :(得分:0)
您正在以正常读取模式打开输入文件,但输出文件以二进制模式打开,正确
resultFile = open("out.csv", "w")
如上所示,如果你更换&#34; wb&#34;用&#34; w&#34;它会起作用。