Dataframe from dict of lists of varying length

时间:2015-10-06 08:52:42

标签: python pandas

I have a dict as follows :

d = {1: ['A'], 2: ['A', 'B'], 3: ['B', 'C']}

(each value is a list of an arbitrary number of items chosen from a given list, here ['A', 'B', 'C'])

I cannot find a simple way to obtain the following DataFrame:

  A B C
1 1 0 0
2 1 1 0
3 0 1 1 

Is there a built-in way to do so ?

Edit: the list of all possible values (here : ['A', 'B', 'C']) is available to me

4 个答案:

答案 0 :(得分:0)

I believe you would need to transform the dictionary a bit to be able to convert it to the DataFrame as you have given.

Example using dictionary comprehension for Python 2.7+ -

d = {k:{kv:v.count(kv) for kv in ['A','B','C']} for k,v in d.items()}
df = pd.DataFrame(d).T

Or in a single line -

df = pd.DataFrame({k:{kv:v.count(kv) for kv in ['A','B','C']} for k,v in d.items()}).T

Demo -

In [18]: d = {1: ['A'], 2: ['A', 'B'], 3: ['B', 'C']}

In [19]: d = {k:{kv:v.count(kv) for kv in ['A','B','C']} for k,v in d.items()}

In [20]: df = pd.DataFrame(d).T

In [21]: df
Out[21]:
   A  B  C
1  1  0  0
2  1  1  0
3  0  1  1

答案 1 :(得分:0)

没有内置的方法可以做你想要的事情,这将有效地获得计数并获得所有可能的值而无需手动输入,创建一个存储Counter dicts的字典,其中包含值的计数,然后迭代唯一可能值的列表,并在计数器中执行查找:

d = {1: ['A'], 2: ['A', 'B',"B"], 3: ['B', 'C',"C"]}
from collections import Counter

unique = list(chain.from_iterable(d.values()))
out = {}
counts = {k: Counter(v) for k, v in d.items()}
for k, v in d.items():
    cnt = counts[k]
    out[k] = {k:cnt[k] for k in unique}
df = pd.DataFrame(out)
print(df.T)

输出:

   A  B  C
1  1  0  0
2  1  2  0
3  0  1  2

Counter方法比使用list.count方法效率更高。

如果您只使用set方法中的每个值中的一个就足够了:

unique = set(chain.from_iterable(d.values()))
out = {}
for k, v in d.items():
    un = unique.difference(v)
    out[k] = {k: 0 if k in un else 1 for k in unique}
df = pd.DataFrame(out)

答案 2 :(得分:0)

适用于任意数量值的通用算法:

d = {1: ['A'], 2: ['A', 'B'], 3: ['B', 'C']}
# list of unique values
unique = list(set([v for val in d.values() for v in val]))

print ' ', ' '.join([str(i) for i in unique])
for k, v in d.items():
    print k, u' '.join([str(1 if unique[i] in v else 0) for i in xrange(len(unique))])

答案 3 :(得分:-1)

You can simply:

d = {1: ['A'], 2: ['A', 'B'], 3: ['B', 'C']}

print '  A B C'
for key,value in d.iteritems():
    print key, value.count('A'), value.count('B'), value.count('C')

Output:

  A B C
1 1 0 0
2 1 1 0
3 0 1 1

You can easily generalize the code to iterate on all available values for all keys.