我有一个如下所述的数据框。
我想为上面的数据框中的所有非零列提供一个字典,就像下面的一样。
{
(0, 'aan'): 1,
(0, 'abcc'): 1,
(1, 'acd'): 1,
(3, 'access'): 5,
(3, 'acd'): 3,
(4, 'aao'): 2,
(4, 'access’): 4
}
答案 0 :(得分:0)
也许,有两个步骤可以解决该问题:
#include <stdio.h>
#include <string.h>
#include <math.h>
unsigned long long int bin2dec(const char *string, const size_t size)
{
unsigned long long int bit, value = 0;
for(size_t index=0;index<size;index++)
{
// moving from the end to the beginning, get a character from the string
// and convert it from a character containing a digit to a number
bit = string[size-index-1]-'0';
// in the original question this was: value += bit*pow(2,index);
// but we can just do this and get the same effect
// without multiplication or library function
value += bit<<index;
}
return value;
}
int main()
{
const char * binary = "111111111111111111111111111111";
unsigned long long int decimal = bin2dec(binary, strlen(binary));
printf("%llu\n",decimal);
return 0;
}
答案 1 :(得分:0)
通过稀疏矩阵对其进行管道传输,然后将DataFrame作为dict返回。 不幸的是,熊猫的稀疏矩阵功能有限,因此我们需要使用scipy。以下代码应适用于您的应用程序。
import scipy as sp
import pandas as pd
import numpy as np #for the random dataframe as example.
# Example dataframe
df = pd.DataFrame(np.random.randint(0,10,size=(1000, 10)))
# Use scipy to create sparse matrix
coo = sp.sparse.csc_matrix(df).tocoo(copy=False)
# Parse sparse matrix back into dataframe without zeroes.
df = pd.DataFrame({'index': coo.row, 'col': coo.col, 'data': coo.data})[
['index', 'col', 'data']].sort_values(['index', 'col']).reset_index(drop=True)
# Create index to split (row, column) from value.
ix = pd.MultiIndex.from_frame(df[['index','col']])
df = df['data'].copy(True)
df.index = ix
# Output as dict
df.to_dict()
df
0 1 2 3 4 5 6 7 8 9
0 4 7 0 3 4 8 6 0 5 3
1 3 3 9 2 1 2 8 2 7 2
2 0 1 5 5 4 3 2 0 4 1
3 6 7 7 7 2 1 3 7 1 1
4 2 5 9 8 9 7 5 4 0 3
{(0, 0): 4,
(0, 1): 7,
(0, 3): 3, # Notice (0,2) is gone.
(0, 4): 4,
(0, 5): 8,
(0, 6): 6,
(0, 8): 5,
(0, 9): 3,
(1, 0): 3,
(1, 1): 3,
(1, 2): 9,
(1, 3): 2,
(1, 4): 1,
(1, 5): 2,
(1, 6): 8,
(1, 7): 2,
(1, 8): 7,
(1, 9): 2,
(2, 1): 1, # Notice (2,0) is gone.
(2, 2): 5,
(2, 3): 5,
(2, 4): 4,
(2, 5): 3,
(2, 6): 2,
(2, 8): 4,
(2, 9): 1,
答案 2 :(得分:0)
这是一种非常基本的暴力手段。 不可扩展。
data = {'aan': [1, 2,0], 'aao': [0,3, 4], 'access':[0,0,1]}
df = pandas.pandas.DataFrame(data=data)
master= {}
for t in df.itertuples():
_ = {(t.Index, col):getattr(t, col) for col in df.columns if getattr(t, col)}
if not _:continue
master.update(_)
打印
{(0, 'aan'): 1, (1, 'aan'): 2, (1, 'aao'): 3, (2, 'aao'): 4, (2, 'access'): 1}