我正在尝试编写一个函数,将数据集中的所有非数字列转换为数字形式。
数据集是一个列表列表。
这是我的代码:
def handle_non_numerical_data(data):
def convert_to_numbers(data, index):
items = []
column = [line[0] for line in data]
for item in column:
if item not in items:
items.append(item)
[line[0] = items.index(line[0]) for line in data]
return new_data
for value in data[0]:
if isinstance(value, str):
convert_to_numbers(data, data[0].index(value))
显然[line[0] = items.index(line[0]) for line in data]
是无效的语法,我无法弄清楚如何在迭代它时修改第一列数据。
我无法使用numpy,因为在运行此函数之前,数据不会是数字形式。
我该怎么做?为什么这么复杂?我觉得这应该比它更简单......
换句话说,我想转此:
[[M,0.455,0.365,0.095,0.514,0.2245,0.101,0.15,15],
[M,0.35,0.265,0.09,0.2255,0.0995,0.0485,0.07,7],
[F,0.53,0.42,0.135,0.677,0.2565,0.1415,0.21,9]]
进入这个:
[[0,0.455,0.365,0.095,0.514,0.2245,0.101,0.15,15],
[0,0.35,0.265,0.09,0.2255,0.0995,0.0485,0.07,7],
[1,0.53,0.42,0.135,0.677,0.2565,0.1415,0.21,9]]
请注意,第一列已从字符串更改为数字。
答案 0 :(得分:1)
data = [['M',0.455,0.365,0.095,0.514,0.2245,0.101,0.15,15],
['M',0.35,0.265,0.09,0.2255,0.0995,0.0485,0.07,7],
['F',0.53,0.42,0.135,0.677,0.2565,0.1415,0.21,9]]
values = {'M': 0, 'F': 1}
new_data = [[values.get(val, val) for val in line] for line in data]
new_data
输出:
[[0, 0.455, 0.365, 0.095, 0.514, 0.2245, 0.101, 0.15, 15],
[0, 0.35, 0.265, 0.09, 0.2255, 0.0995, 0.0485, 0.07, 7],
[1, 0.53, 0.42, 0.135, 0.677, 0.2565, 0.1415, 0.21, 9]]
您可以利用Python词典及其{{1}}方法。
这些是字符串的值:
get
您还可以使用相应的值添加更多字符串,例如values = {'M': 0, 'F': 1}
。
如果字符串为I
,您将从dict获取值:
values
否则,您将获得原始值:
>>> values.get('M', 'M')
0
答案 1 :(得分:0)
您可以改为创建字母到数字的字典映射,而不是索引(我不确定它在您的示例中应该如何工作)。这样的事情应该有效。
raw_data = [['M',0.455,0.365,0.095,0.514,0.2245,0.101,0.15,15],
['M',0.35,0.265,0.09,0.2255,0.0995,0.0485,0.07,7],
['F',0.53,0.42,0.135,0.677,0.2565,0.1415,0.21,9]]
def handle_non_numerical_data(data):
mapping = {'M': 0, 'F': 1, 'I': 2}
for item in raw_data:
if isinstance(item[0], str):
item[0] = mapping.get(item[0], -1) # Returns -1 if letter not found
return data
run = handle_non_numerical_data(raw_data)
print(run)
答案 2 :(得分:0)
此答案将使用dict
来存储从str
到int
的编码。它可以预先加载,也可以在更换数据后进行调查。
# MODIFIES DATA IN PLACE
data = [['M',0.455,0.365,0.095,0.514,0.2245,0.101,0.15,15],
['M',0.35,0.265,0.09,0.2255,0.0995,0.0485,0.07,7],
['F',0.53,0.42,0.135,0.677,0.2565,0.1415,0.21,9]]
coding_dict = {} # can also preload this {'M': 0, 'F':1}
for row in data:
if row[0] not in coding_dict:
coding_dict[row[0]] = len(coding_dict)
row[0] = coding_dict[row[0]]