Question

我正在尝试使用tensorflow / numpy的csv处理从31个整数元素csv中检索元素，我得到一个非常奇怪的错误。 CSV本身都是整数值，在excel和记事本中看起来很好。当我打印csv时，没有动过，就像这样

data_file = csv.reader(csv_file)
#data and target to be used after testing
data, target = [], []
for row in data_file:
  print(row)
  print(len(row))

我得到了我期望的结果（即['1', '-1', '1', '-1', '1', '1', '0', '1', '1', '1', '1', '-1', '-1', '1', '1', '-1', '1', '1', '0', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1'], len=31）但是，如果我尝试从csv的任何行内部检索任何元素（例如，print(row[0])，甚至从索引0，我得到IndexError: list index out of range error。此外，这样做会将所有其他行转换为没有元素的行。这里发生了什么，我该如何解决？（整体问题是让数据集与tf.contrib.learn的框架一起使用，对它的帮助也很有用）

Answer 1

您发布的语法完全没问题。由于目前尚不清楚确切的问题是什么，我是否可以建议您只有在没有太多数据时才能使用的替代方案：

if len(row) > 0

它将从包含文件中的行的列表中创建一个阅读器。

修改

在我粘贴的数据上测试了我的代码和你的代码之后，我很确定你在某个地方有一个流浪的换行符，而且这些都搞砸了。

在你的循环中添加<script async='async' src='https://www.googletagservices.com/tag/js/gpt.js'></script> <script> var googletag = googletag || {}; googletag.cmd = googletag.cmd || []; </script> <script> googletag.cmd.push(function() { var mapping = googletag.sizeMapping(). addSize([992, 0], [[970, 90], [728, 90], [300, 250], [1, 1]]). //desktop addSize([768, 0], [[300, 250], [728, 90], [1, 1]]). //tablet addSize([320, 0], [[320, 50], [320, 100], [300, 250], [1, 1]]). //mobile addSize([0, 0], [[320, 50], [1, 1]]). //other build(); slot1 = googletag.defineSlot('/316721235/DL-TestAds', [728, 90], 'div-gpt-ad-1496860907063-0').defineSizeMapping(mapping).addService(googletag.pubads()); googletag.pubads().enableAsyncRendering(); googletag.enableServices(); }); </script>并尝试忘记发生这种情况。

Answer 2

我记得有一个类似的问题，虽然我从未解决过它，但我迁移到使用Pandas并将csv数据存储在数据帧中。它真的轻视加载csv数据：

import pandas as pd
data = pd.read_csv('CSVfilename')

数据帧也可以更好地用于大型数据集，而不是迭代许多列表/数组中的每个元素。

Answer 3

在numpy中，我可以轻松加载您的pastebin文件：

In [279]: data = np.genfromtxt('stack44527012.csv',delimiter=',',dtype=int)
In [280]: data.shape
Out[280]: (32, 31)
In [281]: data[:5,:]
Out[281]: 
array([[ 1,  0, -1,  1,  1, -1, -1, -1,  1,  1,  1,  1,  1,  1,  0, -1,  1,
         1,  0, -1,  1, -1,  1, -1, -1, -1,  1,  1,  1,  1, -1],
       [-1,  0, -1,  1, -1, -1, -1,  1,  1,  1,  1, -1, -1,  1,  0, -1, -1,
        -1,  0,  1,  1,  1,  1,  1, -1, -1,  1,  1, -1, -1, -1],
       [ 1,  0, -1,  1,  1, -1, -1,  1,  1,  1,  1, -1, -1,  1,  1, -1,  1,
         1,  0,  1,  1,  1,  1,  1, -1, -1,  1,  1,  0,  1, -1],
       [ 1,  1,  1,  1,  1, -1, -1, -1,  1,  1,  1,  1,  1,  1,  1, -1, -1,
        -1,  0,  1,  1,  1,  1, -1, -1, -1,  1,  1, -1, -1, -1],
       [ 1,  1, -1,  1,  1, -1,  0, -1,  1,  1,  1, -1,  1,  1,  0,  1,  1,
         1,  0,  1,  1,  1,  1, -1, -1,  1,  1,  1, -1,  1, -1]])
In [282]:

pastebin tensorflow代码：

target.append(row.pop(30))
data.append(np.asarray(row, dtype=features_dtype))


  target = np.array(target, dtype=target_dtype)
  data = np.array(data)
  return Dataset(data=data, target=target)

删除row中的最后一项并放入target，然后停留在data。

使用我的numpy数组，您将获得与

相同的效果

 result = data[:,-1]
 data = data[:,;-1]

这会拆分31列数组的最后一列。

Answer 4

如果您使用pandas（广泛用于数据操作的python库），您可以轻松高效地完成工作。以下是官方熊猫文档的链接：

http://pandas.pydata.org/pandas-docs/stable/

注意：Pandas具有读取csv文件的内置函数。您也可以使用＆＃39; tolist＆＃39;方法，以便轻松获取csv中的数据列表，然后对其进行操作。

从python中的csv检索元素时出错

4 个答案:

修改