在python中

时间:2017-01-09 16:38:04

标签: python list type-conversion

我正在尝试编写一个函数,将数据集中的所有非数字列转换为数字形式。

数据集是一个列表列表。

这是我的代码:

def handle_non_numerical_data(data):
    def convert_to_numbers(data, index):
        items = []
        column = [line[0] for line in data]
        for item in column:
            if item not in items:
                items.append(item)
        [line[0] = items.index(line[0]) for line in data]
        return new_data

    for value in data[0]:
        if isinstance(value, str):
            convert_to_numbers(data, data[0].index(value))

显然[line[0] = items.index(line[0]) for line in data]是无效的语法,我无法弄清楚如何在迭代它时修改第一列数据。

我无法使用numpy,因为在运行此函数之前,数据不会是数字形式。

我该怎么做?为什么这么复杂?我觉得这应该比它更简单......

换句话说,我想转此:

[[M,0.455,0.365,0.095,0.514,0.2245,0.101,0.15,15],
[M,0.35,0.265,0.09,0.2255,0.0995,0.0485,0.07,7],
[F,0.53,0.42,0.135,0.677,0.2565,0.1415,0.21,9]]

进入这个:

[[0,0.455,0.365,0.095,0.514,0.2245,0.101,0.15,15],
[0,0.35,0.265,0.09,0.2255,0.0995,0.0485,0.07,7],
[1,0.53,0.42,0.135,0.677,0.2565,0.1415,0.21,9]]

请注意,第一列已从字符串更改为数字。

3 个答案:

答案 0 :(得分:1)

解决方案

data = [['M',0.455,0.365,0.095,0.514,0.2245,0.101,0.15,15],
        ['M',0.35,0.265,0.09,0.2255,0.0995,0.0485,0.07,7],
        ['F',0.53,0.42,0.135,0.677,0.2565,0.1415,0.21,9]]

values = {'M': 0, 'F': 1}

new_data = [[values.get(val, val) for val in line] for line in data]
new_data

输出:

[[0, 0.455, 0.365, 0.095, 0.514, 0.2245, 0.101, 0.15, 15],
 [0, 0.35, 0.265, 0.09, 0.2255, 0.0995, 0.0485, 0.07, 7],
 [1, 0.53, 0.42, 0.135, 0.677, 0.2565, 0.1415, 0.21, 9]]

说明

您可以利用Python词典及其{{​​1}}方法。

这些是字符串的值:

get

您还可以使用相应的值添加更多字符串,例如values = {'M': 0, 'F': 1}

如果字符串为I,您将从dict获取值:

values

否则,您将获得原始值:

>>> values.get('M', 'M')
0 

答案 1 :(得分:0)

您可以改为创建字母到数字的字典映射,而不是索引(我不确定它在您的示例中应该如何工作)。这样的事情应该有效。

raw_data = [['M',0.455,0.365,0.095,0.514,0.2245,0.101,0.15,15],
            ['M',0.35,0.265,0.09,0.2255,0.0995,0.0485,0.07,7],
            ['F',0.53,0.42,0.135,0.677,0.2565,0.1415,0.21,9]]

def handle_non_numerical_data(data):
    mapping = {'M': 0, 'F': 1, 'I': 2}

    for item in raw_data:
        if isinstance(item[0], str):
            item[0] = mapping.get(item[0], -1) # Returns -1 if letter not found
    return data

run = handle_non_numerical_data(raw_data)
print(run)

答案 2 :(得分:0)

此答案将使用dict来存储从strint的编码。它可以预先加载,也可以在更换数据后进行调查。

# MODIFIES DATA IN PLACE
data = [['M',0.455,0.365,0.095,0.514,0.2245,0.101,0.15,15],
        ['M',0.35,0.265,0.09,0.2255,0.0995,0.0485,0.07,7],
        ['F',0.53,0.42,0.135,0.677,0.2565,0.1415,0.21,9]]

coding_dict = {} # can also preload this {'M': 0, 'F':1}
for row in data:
    if row[0] not in coding_dict:
        coding_dict[row[0]] = len(coding_dict)
    row[0] = coding_dict[row[0]]