我是新的Python程序员, 我想要这个,
dic = {"word1": ["a","b","c"], "word2": ["b", "d", "e"], "word3": ["a", "f", "c"]}
to,这个DataFrame对象。
我尝试过这样的代码
df = pd.DataFrame(index=["a","b","c","d","e","f"], columns=[])
for i in result:
print("i",i)
print("v", v)
df2 = pd.DataFrame(i)
df.append(df2)
请帮助我如何编码
答案 0 :(得分:2)
首先将dict
转换为Series
,然后使用MultiLabelBinarizer + DataFrame
构造函数,最后转换为布尔值:
d = {"word1": ["a","b","c"], "word2": ["b", "d", "e"], "word3": ["a", "f", "c"]}
s = pd.Series(d)
from sklearn.preprocessing import MultiLabelBinarizer
mlb = MultiLabelBinarizer()
df = pd.DataFrame(mlb.fit_transform(s),columns=mlb.classes_, index=s.index).astype(bool)
str.join
的另一个解决方案,可以|
加入str.get_dummies
中的默认分隔符:
df = s.str.join('|').str.get_dummies().astype(bool)
print (df)
a b c d e f
word1 True True True False False False
word2 False True False True True False
word3 True False True False False True
答案 1 :(得分:1)
以下是使用pd.get_dummies
的一种方式:
import pandas as pd
d = {"word1": ["a","b","c"], "word2": ["b", "d", "e"], "word3": ["a", "f", "c"]}
df = pd.DataFrame.from_dict(d, orient='index')
df['values'] = df.values.tolist()
df = df.drop(df.columns[:], 1)\
.join(pd.get_dummies(df['values'].apply(pd.Series).stack()).sum(level=0))\
.astype(bool)
<强>结果强>
a b c d e f
word1 True True True False False False
word2 False True False True True False
word3 True False True False False True
<强>解释强>
pd.Series
个列表。pd.get_dummies
应用于此系列。int
转换为bool
以用于显示目的。