我正在尝试创建一个3D矩阵,以以下格式保存信息:
"""
The data below is in a text file
Part#1
parameterA 10 10 20 10 10 30 10 30 10 20 30 parameterB 10 10 20 10 10 30 10 30 10 30 10 parameterC 10 20 10 10 30 10 20 10 30 10 20 parameterD 10 10 20 10 10 30 10 30 10 20 30
Part#2
parameterA 10 20 10 10 30 10 20 10 30 10 20 parameterB 10 20 10 30 10 20 10 20 10 20 10 parameterC 10 20 10 10 30 10 20 30 30 10 20 parameterD 10 10 20 20 20 30 10 10 20 20 30
Part#3
parameterA 10 20 10 30 10 20 10 20 10 20 10 parameterB 10 10 20 10 10 30 10 30 10 20 30 parameterC 10 20 10 30 10 10 10 20 10 20 10 parameterD 10 20 10 10 30 10 20 10 30 10 20 parameterE 10 20 10 10 10 10 30 30 30 10 20
"""
应该像这样分隔数据(仅显示DS索引的格式):
矩阵[零件索引] [参数索引] [值线索引]
将括号中的每个部分都以某种类型的框架作为索引,因此我可以分别称呼它们:
matrix[0][0] = 10 10 20 10 10 30 10 30 10 20 30
请记住以下格式:
matrix[part#1][parameter][10 10 20 10 10 30 10 30 10 20 30]
下面的函数解析文件,但没有正确完成
def parse_file_into_matrix(inputFilename):
with open(inputFilename) as inputFile:
content = inputFile.readlines()
content = [x.strip() for x in content]
lines =[]
parts = []
parameterNames = []
parameterValues = []
# separates the runs from each line of elements and its values and adds it to a new list "lines"
for i in range(len(content)):
# Keeps the run numbers if true otherwise assign an element/values line into lines
if re.match(r"Part #[0-3]+",content[i]):
part = int(re.findall(r"Part #([0-3]+)",content[i])[0])
parts.append("Part #"+ str(runs))
else:
lines.append(content[i])
# will contain everything categorized (parts/Parameter/Values)
matrix_3D = np.arange(len(parts))
# separates the parameter name from its values and assigns each one to
# a list of parameters and a list of values based on its order
for j in range(0, len(lines)):
parameter = lines[j].strip().split(' ')[0]
parameterNames.append(parameter)
values = list(map(int, filter(str.isdigit, lines[j].strip().split()[1:])))
parameterValues.append(values)
df_parts = pd.DataFrame(parts)
df_pv = pd.DataFrame(np.column_stack([parameterNames, parameterValues]))
df_pv = np.asarray(df_pv)
return df_pv #it only shows what I have been able to merged, I haven't been able to add the parts to it
pprint(parse_file_into_matrix("sample.txt"))
我现在为此功能获得的输出是:
array([['Part#1', list([])],
['parameterA',
list([10, 10, 20, 10, 10, 30, 10, 30, 10, 20, 30, 10, 10, 20, 10, 10, 30, 10, 30, 10, 30, 10, 10, 20, 10, 10, 30, 10, 20, 10, 30, 10, 20, 10, 10, 20, 10, 10, 30, 10, 30, 10, 20, 30])],
['Part#2', list([])],
['parameterA',
list([10, 20, 10, 10, 30, 10, 20, 10, 30, 10, 20, 10, 20, 10, 30, 10, 20, 10, 20, 10, 20, 10, 10, 20, 10, 10, 30, 10, 20, 30, 30, 10, 20, 10, 10, 20, 20, 20, 30, 10, 10, 20, 20, 30])],
['Part#3', list([])],
['parameterA',
list([10, 20, 10, 30, 10, 20, 10, 20, 10, 20, 10, 10, 10, 20, 10, 10, 30, 10, 30, 10, 20, 30, 10, 20, 10, 30, 10, 10, 10, 20, 10, 20, 10, 10, 20, 10, 10, 30, 10, 20, 10, 30, 10, 20, 10, 20, 10, 10, 10, 10, 30, 30, 30, 10, 20])],
['', list([])]], dtype=object)
这里是问题:
参数值与值(每个字符串参数的数字序列)一起使用,这两个列表的长度相同,彼此对应。因此,如果我获取参数列表和值列表,并将这两个数组与包含零件的数组分开,则会得到13行(包含参数和值的13行),除以3个分区或零件,得出的结果是每零件4.33行。因此,我将不得不舍入除法得到的值,这意味着我将在原始数据中截断大量行以适合所有分区中的均匀长度,或者如上所述,我将不得不添加最后一个分区的多余行可能会出现差异。
注释。-行的长度(参数/值)在水平方向上始终相同;但是,如果我们垂直看(行数),情况会有所不同。
所以如果我单独打印输出,我会得到:
matrix_3D = parse_file_into_matrix("sample.txt")
print(matrix_3D[0])
print(matrix_3D[1])
输出:
['Part#1' list([])] # <- if this was the case the sublist in part should be the parameter with the corresponding list of values for that parameter
['parameterA'
list([10, 10, 20, 10, 10, 30, 10, 30, 10, 20, 30, 10, 10, 20, 10, 10, 30, 10, 30, 10, 30, 10, 10, 20, 10, 10, 30, 10, 20, 10, 30, 10, 20, 10, 10, 20, 10, 10, 30, 10, 30, 10, 20, 30])]
如果找到可以保存数据的正确数据结构,我将能够检查特定分区和参数中的特定序列,如下所示:[part1] [a]这将为我提供整个序列:10 10 20 30 30,如果我想绘图,我也将能够以图形方式看到它。
上面的输出是我获得的最接近的输出,但仍然不是正确的输出,因此任何帮助将不胜感激!
答案 0 :(得分:2)
嵌套词典可以按照您指定的方式帮助访问数据,即matrix ['PartName'] ['paramName'] = [1,2,3,..]
尝试一下:
def myparse(inputFilename):
with open(inputFilename) as inputFile:
lines = inputFile.readlines()
lines = [x.strip() for x in lines]
matrix = {} # nested dictionary
curPart = None
for line in lines:
if 'Part' in line:
currPart = line
matrix[currPart]={}
else:
curParamName = None
partParams = {} # the list of parameters and their values for the current part
pars = line.split()
for p in pars:
try:
# if it is an int, it belongs to the last read parameter name
p = int (p)
partParams[curParamName].append(p)
except ValueError:
# if the cast to int fails it means it is a new parameter name
curParamName = p
partParams[curParamName] = []
matrix[currPart] = partParams
return matrix
matrix = myparse('sample.txt')
print 'Matrix:\n___________\n', matrix, '\n'
print 'Matrix[Part #1]:\n___________\n', matrix['Part #1'], '\n'
print 'Matrix[Part #1][ParameterA]:\n___________\n', matrix['Part #1']['parameterA']
输出:
Matrix:
___________
{'Part #3': {'parameterE': [10, 20, 10, 10, 10, 10, 30, 30, 30, 10, 20], 'parameterD': [10, 20, 10, 10, 30, 10, 20, 10, 30, 10, 20], 'parameterA': [10, 20, 10, 30, 10, 20, 10, 20, 10, 20, 10], 'parameterC': [10, 20, 10, 30, 10, 10, 10, 20, 10, 20, 10], 'parameterB': [10, 10, 20, 10, 10, 30, 10, 30, 10, 20, 30]}, 'Part #2': {'parameterD': [10, 10, 20, 20, 20, 30, 10, 10, 20, 20, 30], 'parameterA': [10, 20, 10, 10, 30, 10, 20, 10, 30, 10, 20], 'parameterC': [10, 20, 10, 10, 30, 10, 20, 30, 30, 10, 20], 'parameterB': [10, 20, 10, 30, 10, 20, 10, 20, 10, 20, 10]}, 'Part #1': {'parameterD': [10, 10, 20, 10, 10, 30, 10, 30, 10, 20, 30], 'parameterA': [10, 10, 20, 10, 10, 30, 10, 30, 10, 20, 30], 'parameterC': [10, 20, 10, 10, 30, 10, 20, 10, 30, 10, 20], 'parameterB': [10, 10, 20, 10, 10, 30, 10, 30, 10, 30, 10]}}
Matrix[Part #1]:
___________
{'parameterD': [10, 10, 20, 10, 10, 30, 10, 30, 10, 20, 30], 'parameterA': [10, 10, 20, 10, 10, 30, 10, 30, 10, 20, 30], 'parameterC': [10, 20, 10, 10, 30, 10, 20, 10, 30, 10, 20], 'parameterB': [10, 10, 20, 10, 10, 30, 10, 30, 10, 30, 10]}
Matrix[Part #1][ParameterA]:
___________
[10, 10, 20, 10, 10, 30, 10, 30, 10, 20, 30]