所以我的数据如下:
data = {"technology1": [
[
20, 0.02,
u'10.00,106.10,107.00,107.00,0.45',
u'24.00,-47.15,-49.50,-51.00,0.12',
u'11.00,0.35,0.00,0.00,0.92',
u'0.00',0.04,0.16, u'0.223196881092', u'f',0.02,
],
[
100, 0.02,
u'10.00,106.10,107.00,107.00,0.45',
u'24.00,-47.15,-49.50,-51.00,0.12',
u'11.00,0.35,0.00,0.00,0.92', u'0.00', 0.04,
0.16, u'0.223196881092', u'f', 0.01
] ... ],
"technology2": ...}
如您所见,它是一本字典,每个键都访问一个列表列表,所有列表都具有相同的格式。每个“内部”列表都包含整数,浮点数。还有unicode字符串,其中一些带有单个值,有些在unicode字符串中带有一组5个数字。
我想要什么:
为每种技术制作一个阵列。在每个数组中,行将是上面的“外部”列表,列将是“内部列表”的不同元素。理想情况下,需要将unicode转换为字符串(因为我知道如何更好地使用它们),并且unicode字符串中5个数字的集合需要扩展为每个元素。
即技术阵列1
20, 0.02, 10.00, 106.10, ... "f", 0.02
100, 0.02, ... "f", 0.01
到目前为止尝试:
for tech in data:
features = data[tech] # i.e. grab technologyn
for row in features:
for i in row[2:5]: # 2 til 5 defines the instance which are sets of 5
#print i,"\n"
i = str(i)
i = i.split(',')
这不起作用,当我在代码执行后查看功能时,它是完全一样的!
这不是一个完整解决方案的尝试,因为它显然不会将所有unicode类型转换为字符串,但这是一个垫脚石。 我还尝试这样使用列表理解:
for row in features:
[i.split(',') for i in row if (type(i)==unicode and "," in i)]
答案 0 :(得分:1)
您需要为每行创建一个新的列表对象,然后替换原始列表值:
def row_to_values(row):
values = []
for col in row:
if isinstance(col, unicode) and col != u'f':
# split and convert all entries to float
values += (float(v) for v in col.split(','))
else:
values.append(col)
return values
for value in data.values():
value[:] = [row_to_values(row) for row in value]
value[:] = ...
分配告诉Python将列表对象 中包含的所有索引替换为一组新对象。由于每个value
都是data
词典中的外部列表,因此将所有子列表替换为扩展行。
演示部分样本数据:
>>> data = {"technology1": [
... [
... 20, 0.02,
... u'10.00,106.10,107.00,107.00,0.45',
... u'24.00,-47.15,-49.50,-51.00,0.12',
... u'11.00,0.35,0.00,0.00,0.92',
... u'0.00',0.04,0.16, u'0.223196881092', u'f',0.02,
... ],
... [
... 100, 0.02,
... u'10.00,106.10,107.00,107.00,0.45',
... u'24.00,-47.15,-49.50,-51.00,0.12',
... u'11.00,0.35,0.00,0.00,0.92', u'0.00', 0.04,
... 0.16, u'0.223196881092', u'f', 0.01
... ]],
... }
>>> from pprint import pprint
>>> pprint(data["technology1"][0])
[20,
0.02,
u'10.00,106.10,107.00,107.00,0.45',
u'24.00,-47.15,-49.50,-51.00,0.12',
u'11.00,0.35,0.00,0.00,0.92',
u'0.00',
0.04,
0.16,
u'0.223196881092',
u'f',
0.02]
>>> pprint(row_to_values(data["technology1"][0]))
[20,
0.02,
10.0,
106.1,
107.0,
107.0,
0.45,
24.0,
-47.15,
-49.5,
-51.0,
0.12,
11.0,
0.35,
0.0,
0.0,
0.92,
0.0,
0.04,
0.16,
0.223196881092,
u'f',
0.02]
因此,通过返回新列表对象的函数调用,可以扩展一行以包含字符串中的所有浮点值。
使用该函数替换所有字典值中的所有行:
>>> for value in data.values():
... value[:] = [row_to_values(row) for row in value]
...
我们可以看到之前查看的第一行已更新:
>>> pprint(data["technology1"][0])
[20,
0.02,
10.0,
106.1,
107.0,
107.0,
0.45,
24.0,
-47.15,
-49.5,
-51.0,
0.12,
11.0,
0.35,
0.0,
0.0,
0.92,
0.0,
0.04,
0.16,
0.223196881092,
u'f',
0.02]
字典的其余部分也是如此:
>>> pprint(data)
{'technology1': [[20,
0.02,
10.0,
106.1,
107.0,
107.0,
0.45,
24.0,
-47.15,
-49.5,
-51.0,
0.12,
11.0,
0.35,
0.0,
0.0,
0.92,
0.0,
0.04,
0.16,
0.223196881092,
u'f',
0.02],
[100,
0.02,
10.0,
106.1,
107.0,
107.0,
0.45,
24.0,
-47.15,
-49.5,
-51.0,
0.12,
11.0,
0.35,
0.0,
0.0,
0.92,
0.0,
0.04,
0.16,
0.223196881092,
u'f',
0.01]]}
答案 1 :(得分:0)
我提出了清单理解繁重的解决方案。如果转换与任务目标不完全匹配,请在下面发表评论。内联解释为代码段中的注释:
def split_or_wrap(item):
"""Split if str, wrap if number."""
if isinstance(item, str):
return item.split(',')
elif isinstance(item, int) or isinstance(item, float):
return [item]
else:
raise Exception("Unxpected item.")
def try_to_convert(item):
"""Try to convert string into in, then into float or leave as is"""
try:
return int(item)
except:
try:
return float(item)
except:
return item
# initial list contains values' side of data dictionary
initial_list = [item for item in data.values()]
# flattened list contains list of lists where each list
# corresponds to single technology
flattened_list = [[item
for tech_list in outer_list
for item in tech_list]
for outer_list in initial_list]
# deconstructed list takes unicode strings and splits them.
# To make resulting elements consistently nested into lists
# we take single elements and put also in a list.
# This enables us to treat all lists similarly on final flattening step.
deconstructed_list = [[split_or_wrap(tech_item)
for tech_item in tech_list]
for tech_list in flattened_list]
# final list contains array of arrays where each array
# contains single numbers (if they are convertible).
# This is done through flattening the so called item-wrapper
# lists into the list corresponding to a particular technology.
final_list = [[try_to_convert(item)
for item_wrapper in tech_list
for item in item_wrapper]
for tech_list in deconstructed_list]
print(final_list)
输出:
[[20, 0, 10.0, 106.1, 107.0, 107.0, 0.45, 24.0, -47.15, -49.5, -51.0, 0.12, 11.0, 0.35, 0.0, 0.0, 0.92, 0.0, 0, 0, 0.223196881092, 'f', 0, 100, 0, 10.0, 106.1, 107.0, 107.0, 0.45, 24.0, -47.15, -49.5, -51.0, 0.12, 11.0, 0.35, 0.0, 0.0, 0.92, 0.0, 0, 0, '0.223196881092f', 0],
[100, 0, 10.0, 106.1, 107.0, 107.0, 0.45]]