Question

所以我试图从文本文件中提取一些数据。目前，我能够获得包含数据的正确行，这反过来又为我提供了如下输出：

[   0.2      0.148  100.   ]
[   0.3      0.222  100.   ]
[   0.4      0.296  100.   ]
[   0.5     0.37  100.  ]
[   0.6      0.444  100.   ]

所以基本上我有5个列表，每个列表中都有一个字符串。但是，正如您可以想象的那样，我希望将所有这些变成一个numpy数组，每个字符串分成3个值。像这样：

[[0.2, 0.148, 100],
[0.3, 0.222, 100],
[0.4, 0.296, 100],
[0.5, 0.37, 100],
[0.6, 0.444, 100]]

但由于输出中的分隔符是随机的，即我不知道它是3个空格，5个空格还是标签，我有点迷失在如何做到这一点。

更新：

所以数据看起来有点像这样：

data_file = 

Equiv. Sphere Diam. [cm]: 6.9
Conformity Index: N/A
Gradient Measure [cm]: N/A

Relative dose [%]           Dose [Gy] Ratio of Total Structure Volume [%]
                0                   0                       100
              0.1               0.074                       100
              0.2               0.148                       100
              0.3               0.222                       100
              0.4               0.296                       100
              0.5                0.37                       100
              0.6               0.444                       100
              0.7               0.518                       100
              0.8               0.592                       100

Uncertainty plan: U1 X:+3.00cm   (variation of plan: CT1)
Dose Cover.[%]: 100.0
Sampling Cover.[%]: 100.0

Relative dose [%]           Dose [Gy] Ratio of Total Structure Volume [%]
                0                   0                       100
              0.1               0.074                       100
              0.2               0.148                       100
              0.3               0.222                       100
              0.4               0.296                       100
              0.5                0.37                       100
              0.6               0.444                       100

获取这些行的代码是：

with open(data_file) as input_data:
        # Skips text before the beginning of the interesting block:
        for line in input_data:
            if line.strip() == 'Relative dose [%]           Dose [Gy] Ratio of Total Structure Volume [%]':  # Or whatever test is needed
                break
        # Reads text until the end of the block:
        for line in input_data:  # This keeps reading the file
            if line.strip() == 'Uncertainty plan: U1 X:+3.00cm   (variation of plan: CT1)':
                break
            text_line = np.fromstring(line, sep='\t')
            print text_line

所以自己的数据之前的文本是随机的，所以我不能说＆＃34;跳过前5行＆＃34;，但是标题总是相同的，它结束于同一个同样（在下一个数据开始之前）。所以我只需要一种方法来获取原始数据，将其放入一个numpy数组中，然后我可以从那里使用它。

希望现在更有意义。

Answer 1

给出一个名为tmp.txt的文本文件，如下所示：

   0.2      0.148  100.   
   0.3      0.222  100.   
   0.4      0.296  100.   
   0.5     0.37  100.  
   0.6      0.444  100.

摘录：

with open('tmp.txt', 'r') as in_file:
    print [map(float, line.split()) for line in in_file.readlines()]

将输出：

[[0.2, 0.148, 100.0], [0.3, 0.222, 100.0], [0.4, 0.296, 100.0], [0.5, 0.37, 100.0], [0.6, 0.444, 100.0]]

希望这是你想要的。

Answer 2

1）在with open之前添加：

import re
d_input = []

2）替换

        text_line = np.fromstring(line, sep='\t')
        print text_line

到

        d_input.append([float(x) for x in re.sub('\s+', ',', line.strip()).split(',')])

3）最后添加：

d_array = np.array(d_input)

Answer 3

使用print text_line，您会看到格式化为字符串的数组。它们是单独格式化的，因此列不会排列。

[   0.2      0.148  100.   ]
[   0.3      0.222  100.   ]
[   0.4      0.296  100.   ]
[   0.5     0.37  100.  ]
[   0.6      0.444  100.   ]

而不是打印，您可以收集列表中的值，并在最后连接它。

如果没有实际测试，我认为这样可行：

data = []
with open(data_file) as input_data:
        # Skips text before the beginning of the interesting block:
        for line in input_data:
            if line.strip() == 'Relative dose [%]           Dose [Gy] Ratio of Total Structure Volume [%]':  # Or whatever test is needed
                break
        # Reads text until the end of the block:
        for line in input_data:  # This keeps reading the file
            if line.strip() == 'Uncertainty plan: U1 X:+3.00cm   (variation of plan: CT1)':
                break
            arr_line = np.fromstring(line, sep='\t')
            # may need a test on len(arr_line) to weed out blank lines
            data.append(arr_line)
data = np.vstack(data)

另一种选择是在不解析的情况下收集行，并将它们传递给np.genfromtxt。换句话说，使用您的代码作为过滤器来为numpy函数提供正确的行。它从输入行的任何内容中获取输入 - 文件，列表，生成器。

def filter(input_data):
    # Skips text before the beginning of the interesting block:
    for line in input_data:
        if line.strip() == 'Relative dose [%]           Dose [Gy] Ratio of Total Structure Volume [%]':  # Or whatever test is needed
            break
    # Reads text until the end of the block:
    for line in input_data:  # This keeps reading the file
        if line.strip() == 'Uncertainty plan: U1 X:+3.00cm   (variation of plan: CT1)':
            break
        yield line
with open(data_file) as f:
    data = np.genfromtxt(filter(f))  # delimiter?
print(data)

将字符串列表转换为Numpy数组（Python）

3 个答案: