所以我试图从文本文件中提取一些数据。目前,我能够获得包含数据的正确行,这反过来又为我提供了如下输出:
[ 0.2 0.148 100. ]
[ 0.3 0.222 100. ]
[ 0.4 0.296 100. ]
[ 0.5 0.37 100. ]
[ 0.6 0.444 100. ]
所以基本上我有5个列表,每个列表中都有一个字符串。但是,正如您可以想象的那样,我希望将所有这些变成一个numpy数组,每个字符串分成3个值。像这样:
[[0.2, 0.148, 100],
[0.3, 0.222, 100],
[0.4, 0.296, 100],
[0.5, 0.37, 100],
[0.6, 0.444, 100]]
但由于输出中的分隔符是随机的,即我不知道它是3个空格,5个空格还是标签,我有点迷失在如何做到这一点。
更新:
所以数据看起来有点像这样:
data_file =
Equiv. Sphere Diam. [cm]: 6.9
Conformity Index: N/A
Gradient Measure [cm]: N/A
Relative dose [%] Dose [Gy] Ratio of Total Structure Volume [%]
0 0 100
0.1 0.074 100
0.2 0.148 100
0.3 0.222 100
0.4 0.296 100
0.5 0.37 100
0.6 0.444 100
0.7 0.518 100
0.8 0.592 100
Uncertainty plan: U1 X:+3.00cm (variation of plan: CT1)
Dose Cover.[%]: 100.0
Sampling Cover.[%]: 100.0
Relative dose [%] Dose [Gy] Ratio of Total Structure Volume [%]
0 0 100
0.1 0.074 100
0.2 0.148 100
0.3 0.222 100
0.4 0.296 100
0.5 0.37 100
0.6 0.444 100
获取这些行的代码是:
with open(data_file) as input_data:
# Skips text before the beginning of the interesting block:
for line in input_data:
if line.strip() == 'Relative dose [%] Dose [Gy] Ratio of Total Structure Volume [%]': # Or whatever test is needed
break
# Reads text until the end of the block:
for line in input_data: # This keeps reading the file
if line.strip() == 'Uncertainty plan: U1 X:+3.00cm (variation of plan: CT1)':
break
text_line = np.fromstring(line, sep='\t')
print text_line
所以自己的数据之前的文本是随机的,所以我不能说"跳过前5行",但是标题总是相同的,它结束于同一个同样(在下一个数据开始之前)。所以我只需要一种方法来获取原始数据,将其放入一个numpy数组中,然后我可以从那里使用它。
希望现在更有意义。
答案 0 :(得分:1)
给出一个名为tmp.txt
的文本文件,如下所示:
0.2 0.148 100.
0.3 0.222 100.
0.4 0.296 100.
0.5 0.37 100.
0.6 0.444 100.
摘录:
with open('tmp.txt', 'r') as in_file:
print [map(float, line.split()) for line in in_file.readlines()]
将输出:
[[0.2, 0.148, 100.0], [0.3, 0.222, 100.0], [0.4, 0.296, 100.0], [0.5, 0.37, 100.0], [0.6, 0.444, 100.0]]
希望这是你想要的。
答案 1 :(得分:1)
1)在with open
之前添加:
import re
d_input = []
2)替换
text_line = np.fromstring(line, sep='\t')
print text_line
到
d_input.append([float(x) for x in re.sub('\s+', ',', line.strip()).split(',')])
3)最后添加:
d_array = np.array(d_input)
答案 2 :(得分:1)
使用print text_line
,您会看到格式化为字符串的数组。它们是单独格式化的,因此列不会排列。
[ 0.2 0.148 100. ]
[ 0.3 0.222 100. ]
[ 0.4 0.296 100. ]
[ 0.5 0.37 100. ]
[ 0.6 0.444 100. ]
而不是打印,您可以收集列表中的值,并在最后连接它。
如果没有实际测试,我认为这样可行:
data = []
with open(data_file) as input_data:
# Skips text before the beginning of the interesting block:
for line in input_data:
if line.strip() == 'Relative dose [%] Dose [Gy] Ratio of Total Structure Volume [%]': # Or whatever test is needed
break
# Reads text until the end of the block:
for line in input_data: # This keeps reading the file
if line.strip() == 'Uncertainty plan: U1 X:+3.00cm (variation of plan: CT1)':
break
arr_line = np.fromstring(line, sep='\t')
# may need a test on len(arr_line) to weed out blank lines
data.append(arr_line)
data = np.vstack(data)
另一种选择是在不解析的情况下收集行,并将它们传递给np.genfromtxt
。换句话说,使用您的代码作为过滤器来为numpy函数提供正确的行。它从输入行的任何内容中获取输入 - 文件,列表,生成器。
def filter(input_data):
# Skips text before the beginning of the interesting block:
for line in input_data:
if line.strip() == 'Relative dose [%] Dose [Gy] Ratio of Total Structure Volume [%]': # Or whatever test is needed
break
# Reads text until the end of the block:
for line in input_data: # This keeps reading the file
if line.strip() == 'Uncertainty plan: U1 X:+3.00cm (variation of plan: CT1)':
break
yield line
with open(data_file) as f:
data = np.genfromtxt(filter(f)) # delimiter?
print(data)