我需要阅读文件https://drive.google.com/open?id=0B29hT1HI-pwxMjBPQWFYaWoyalE) 但是,我尝试了3-4种不同的代码方法,并反复得到错误:“line contains NULL byte”。我在其他主题上读到这是你的csv的一个问题 - 但是,这是我的教授将加载的文件并给我评分,我无法修改它,所以我正在寻找一个解决这个错误的方法。
正如我所提到的,我尝试了几种不同的方法来打开文件。这是我最好的两个:
def largestState():
INPUT = "statepopulations.csv"
COLUMN = 5 # 6th column
with open(INPUT, "rU") as csvFile:
theFile = csv.reader(csvFile)
header = next(theFile, None) # skip header row
pop = [float(row[COLUMN]) for row in theFile]
max_pop = max(pop)
print max_pop
largestState()
这会导致NULL字节错误。请忽略额外的max_pop行。读入文件后的下一步是找到第F行的最大值。
def test():
with open('state-populations.csv', 'rb') as f:
reader = csv.reader(f)
for row in reader:
print row
test()
这会导致NULL字节错误。
如果有人能为这个问题提供一个简单的解决方案,我将非常感激。
文件为.txt:https://drive.google.com/open?id=0B29hT1HI-pwxZzhlMGZGVVAzX28
答案 0 :(得分:2)
首先,您通过Google云端硬盘链接提供的“csv”文件 NOT 是一个csv文件。它是一个gzip'ed xml文件。
[~/Downloads] file state-populations.csv
state-populations.csv: gzip compressed data, from Unix
[~/Downloads] gzip -d state-populations.csv
gzip: state-populations.csv: unknown suffix -- ignored
[~/Downloads] mv state-populations.csv state-populations.csv.gz
[~/Downloads] gzip -d state-populations.csv.gz
[~/Downloads] ls state-populations.csv
state-populations.csv
[~/Downloads] file state-populations.csv
state-populations.csv: XML 1.0 document text, ASCII text, with very long lines
您可以使用一些xml模块来解析它
[~/Downloads] python
Python 2.7.10 (default, Jul 30 2016, 18:31:42)
[GCC 4.2.1 Compatible Apple LLVM 8.0.0 (clang-800.0.34)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import xml
>>> import xml.etree.ElementTree as ET
>>> tree = ET.parse('state-populations.csv')
>>> root = tree.getroot()
>>> root
<Element '{http://www.gnumeric.org/v10.dtd}Workbook' at 0x10ded51d0>
>>> root.tag
'{http://www.gnumeric.org/v10.dtd}Workbook'
>>> for child in root:
... print child.tag, child.attrib
...
{http://www.gnumeric.org/v10.dtd}Version {'Epoch': '1', 'Full': '1.12.9', 'Major': '12', 'Minor': '9'}
{http://www.gnumeric.org/v10.dtd}Attributes {}
{urn:oasis:names:tc:opendocument:xmlns:office:1.0}document-meta {'{urn:oasis:names:tc:opendocument:xmlns:office:1.0}version': '1.2'}
{http://www.gnumeric.org/v10.dtd}Calculation {'ManualRecalc': '0', 'MaxIterations': '100', 'EnableIteration': '1', 'IterationTolerance': '0.001', 'FloatRadix': '2', 'FloatDigits': '53'}
{http://www.gnumeric.org/v10.dtd}SheetNameIndex {}
{http://www.gnumeric.org/v10.dtd}Geometry {'Width': '864', 'Height': '322'}
{http://www.gnumeric.org/v10.dtd}Sheets {}
{http://www.gnumeric.org/v10.dtd}UIData {'SelectedTab': '0'}
答案 1 :(得分:0)
新的.txt文件看起来不错,您的函数largestState()
提供了正确的输出。最后只需return
代替print
。
def largestState():
INPUT = "state-populations.txt"
COLUMN = 5 # 6th column
with open(INPUT, "rU") as csvFile:
theFile = csv.reader(csvFile)
header = next(theFile, None) # skip header row
pop = [float(row[COLUMN]) for row in theFile]
max_pop = max(pop)
return(max_pop)
largestState()