I have a dict as follows :
d = {1: ['A'], 2: ['A', 'B'], 3: ['B', 'C']}
(each value is a list of an arbitrary number of items chosen from a given list, here ['A', 'B', 'C']
)
I cannot find a simple way to obtain the following DataFrame:
A B C
1 1 0 0
2 1 1 0
3 0 1 1
Is there a built-in way to do so ?
Edit: the list of all possible values (here : ['A', 'B', 'C']) is available to me
答案 0 :(得分:0)
I believe you would need to transform the dictionary a bit to be able to convert it to the DataFrame as you have given.
Example using dictionary comprehension for Python 2.7+ -
d = {k:{kv:v.count(kv) for kv in ['A','B','C']} for k,v in d.items()}
df = pd.DataFrame(d).T
Or in a single line -
df = pd.DataFrame({k:{kv:v.count(kv) for kv in ['A','B','C']} for k,v in d.items()}).T
Demo -
In [18]: d = {1: ['A'], 2: ['A', 'B'], 3: ['B', 'C']}
In [19]: d = {k:{kv:v.count(kv) for kv in ['A','B','C']} for k,v in d.items()}
In [20]: df = pd.DataFrame(d).T
In [21]: df
Out[21]:
A B C
1 1 0 0
2 1 1 0
3 0 1 1
答案 1 :(得分:0)
没有内置的方法可以做你想要的事情,这将有效地获得计数并获得所有可能的值而无需手动输入,创建一个存储Counter
dicts的字典,其中包含值的计数,然后迭代唯一可能值的列表,并在计数器中执行查找:
d = {1: ['A'], 2: ['A', 'B',"B"], 3: ['B', 'C',"C"]}
from collections import Counter
unique = list(chain.from_iterable(d.values()))
out = {}
counts = {k: Counter(v) for k, v in d.items()}
for k, v in d.items():
cnt = counts[k]
out[k] = {k:cnt[k] for k in unique}
df = pd.DataFrame(out)
print(df.T)
输出:
A B C
1 1 0 0
2 1 2 0
3 0 1 2
Counter
方法比使用list.count
方法效率更高。
如果您只使用set方法中的每个值中的一个就足够了:
unique = set(chain.from_iterable(d.values()))
out = {}
for k, v in d.items():
un = unique.difference(v)
out[k] = {k: 0 if k in un else 1 for k in unique}
df = pd.DataFrame(out)
答案 2 :(得分:0)
适用于任意数量值的通用算法:
d = {1: ['A'], 2: ['A', 'B'], 3: ['B', 'C']}
# list of unique values
unique = list(set([v for val in d.values() for v in val]))
print ' ', ' '.join([str(i) for i in unique])
for k, v in d.items():
print k, u' '.join([str(1 if unique[i] in v else 0) for i in xrange(len(unique))])
答案 3 :(得分:-1)
You can simply:
d = {1: ['A'], 2: ['A', 'B'], 3: ['B', 'C']}
print ' A B C'
for key,value in d.iteritems():
print key, value.count('A'), value.count('B'), value.count('C')
Output:
A B C
1 1 0 0
2 1 1 0
3 0 1 1
You can easily generalize the code to iterate on all available values for all keys.