Question

如果我有以下数据：

Code, data_1, data_2, data_3, [....], data204700

a,1,1,0, ... , 1
b,1,0,0, ... , 1
a,1,1,0, ... , 1
c,0,1,0, ... , 1
b,1,0,0, ... , 1
etc. same code different value (0, 1, ?(not known))

我需要创建一个大矩阵，我想分析。

如何在字典中导入数据？

我想将字典用于列（204.700 + 1）

有一个内置功能（或包）返回给我模式？

（我希望百分比模式）。我的意思是第1列中1％的90％，第2列中的80％。

Answer 1

好吧所以我假设你想在字典中存储这个目的，我会告诉你，你不希望这种数据。使用pandas DataFrame

这就是将代码放入数据帧的方法：

import pandas as pd
my_file = 'file_name'
df = pd.read_csv(my_file)

现在你不需要一个包来返回你正在寻找的模式，只需写一个简单的算法来返回它！

def one_percentage(data):
    #get total number of rows for calculating percentages
    size = len(data)
    #get type so only grabbing the correct rows
    x = data.columns[1]
    x = data[x].dtype
    #list of touples to hold amount of 1s and the column names
    ones = [(i,sum(data[i])) for i in data if data[i].dtype == x]
    my_dict = {}
    #create dictionary with column names and percent
    for x in ones:
        percent = x[1]/float(size)
        my_dict[x[0]] = percent
    return my_dict

现在，如果您想获得任何列中的百分比，请执行以下操作：

percentages = one_percentage(df)
column_name = 'any_column_name'
print percentages[column_name]

现在如果你想让它完成每一列，那么你可以获取所有的列名并循环遍历它们：

columns = [name for name in percentages]
for name in columns:
    print str(percentages[name]) + "% of 1 in column " + name

如果您还有其他需要，请告诉我们！

Python导入数据字典和模式

1 个答案: