Question

我有一个包含模拟数据的文本文件（60列，100k行）：

...第一行中的变量名称，下面（列中）是相应的数据（浮点类型）。

我需要将所有这些变量与他们在Python中的数据一起使用以进行进一步的计算。例如，当我插入：

print(b)

我需要从第二列接收值。

我知道如何导入数据：

data=np.genfromtxt("1.txt", unpack=True, skiprows = 1)

“手动”分配变量：

a,b,c=np.genfromtxt("1.txt", unpack=True, skiprows = 1)

但是我在获取变量名时遇到了麻烦：

reader = csv.reader(open("1.txt", "rt"))
for row in reader: 
   list.append(row)
variables=(list[0])

如何更改此代码以从第一行获取所有变量名称并将其分配给导入的数组？

Answer 1

答案是：您不想这样做。

词典的设计正是出于这个目的：您实际想要的数据结构将类似于：

data = {
    "a": [1, 2, 3, 4],
    "b": [11, 22, 33, 44],
    "c": [111, 222, 333, 444],
}

...然后您可以使用例如data["a"]。

可能做你想做的事，但通常的方法是依赖于Python在内部使用（drumroll） dict这一事实的黑客攻击存储变量 - 并且由于你的代码不知道这些变量的名称，你将不得不使用字典访问来获取它们......所以你可能只是首先使用字典。 / p>

值得指出的是，这在故意在Python中变得困难，因为如果你的代码不知道变量的名称，那么它们是定义数据而不是逻辑，应该被对待就这样。

如果你还不相信，这里有一篇关于这个主题的好文章：

<强> Stupid Python Ideas: Why you don't want to dynamically create variables

Answer 2

您可以考虑使用associative array（在Python中称为dict）来存储变量及其值，而不是尝试分配名称。然后代码看起来像这样（从csv docs大量借用）：

import csv
with open('1.txt', 'rt') as f:
  reader = csv.reader(f, delimiter=' ', skipinitialspace=True)

  lineData = list()

  cols = next(reader)
  print(cols)

  for col in cols:
    # Create a list in lineData for each column of data.
    lineData.append(list())


  for line in reader:
    for i in xrange(0, len(lineData)):
      # Copy the data from the line into the correct columns.
      lineData[i].append(line[i])

  data = dict()

  for i in xrange(0, len(cols)):
    # Create each key in the dict with the data in its column.
    data[cols[i]] = lineData[i]

print(data)

data然后包含您的每个变量，可以通过data['varname']访问。

因此，例如，根据您提问中的输入，您可以data['a']获取列表['1', '2', '3', '4']。

与上面显示的基于dict的方法相比，我认为尝试基于文档中的数据创建名称可能是一种相当尴尬的方法。但是，如果你真的想这样做，你可能会在Python中研究reflection（一个我对此一无所知的主题）。

Answer 3

感谢@ andyg0808和@Zero Piraeus，我找到了另一个解决方案。对我来说，最合适的 - 使用熊猫数据分析库。

   import pandas as pd

   data=pd.read_csv("1.txt",
           delim_whitespace=True,
           skipinitialspace=True)

  result=data["a"]*data["b"]*3
  print(result)

  0     33
  1    132
  2    297
  3    528

...其中0,1,2,3是行索引。

Answer 4

这是将变量名和数据的.txt文件转换为NumPy数组的简单方法。

D = np.genfromtxt('1.txt',dtype='str')    # load the data in as strings
D_data = np.asarray(D[1::,:],dtype=float) # convert the data to floats
D_names = D[0,:]                          # save a list of the variable names

for i in range(len(D_names)):
    key = D_names[i]                      # define the key for this variable 
    val = D_data[:,i]                     # set the value for this variable 
    exec(key + '=val')                    # build the variable  code here

我喜欢这种方法，因为它易于遵循并且易于维护。我们可以按如下方式压缩此代码：

D = np.genfromtxt('1.txt',dtype='str')     # load the data in as strings
for i in range(D.shape[1]):
    val = np.asarray(D[1::,i],dtype=float) # set the value for this variable 
    exec(D[0,i] + '=val')                  # build the variable

两个代码都做同样的事情，返回名为a，b和c的NumPy数组及其关联数据。

在Python中从文本文件导入数据和变量名称

4 个答案: