说你有以下篮子:
basket1 = ['apple', 'orange', 'banana']
basket2 = ['orange', 'grape']
basket3 = ['banana', 'grape', 'kiwi', 'orange']
baskets = [basket1, basket2, basket3]
您的目标是创建以下数据结构:
pd.DataFrame({'apple': {'basket1': 1,'basket2': 0,'basket3': 0 }, 'orange': {'basket1': 1,'basket2': 1,'basket3': 1 }, 'banana': {'basket1': 1,'basket2': 0,'basket3': 1 }, 'grape': {'basket1': 0,'basket2': 1,'basket3': 1 }, 'kiwi': {'basket1': 0,'basket2': 0,'basket3': 1 } })
我知道来自集合的Counter
和来自numpy的bincount
,如果你只想要一个像上面那样的二进制列表,你可以利用它,但是你想要提出一些其他价值在以下每一点上:
例如,假设在每个点上,而不是1,你想要将你碰巧拥有的水果的重量放在另一个表中:
pd.DataFrame({'weight': {'apple': 3, 'orange':3, 'banana':2, 'grape':1, 'kiwi':2}})
你想要的结果是:
pd.DataFrame({'apple': { 'basket1': 3, 'basket2': 0, 'basket3': 0 }, 'orange': { 'basket1': 3, 'basket2': 3, 'basket3': 3 }, 'banana': { 'basket1': 2, 'basket2': 0, 'basket3': 2 }, 'grape': { 'basket1': 0, 'basket2': 1, 'basket3': 1 }, 'kiwi': { 'basket1': 0, 'basket2': 0, 'basket3': 2 } })
你会如何干净地编写这样的操作?我不太确定如何有效或好地执行此操作。
答案 0 :(得分:2)
假设您开始使用pd.Dataframe
和dict
:
In [37]: df1
Out[37]:
apple banana grape kiwi orange
basket1 1 1 0 0 1
basket2 0 0 1 0 1
basket3 0 1 1 1 1
In [38]: mapper = {'apple': 3, 'orange':3, 'banana':2, 'grape':1, 'kiwi':2}
然后简单地说:
In [39]: for colname in df1:
...: df1[colname] = df1[colname]*mapper[colname]
...:
In [40]: df1
Out[40]:
apple banana grape kiwi orange
basket1 3 2 0 0 3
basket2 0 0 1 0 3
basket3 0 2 1 2 3
或者更简单地说,您可以通过pd.DataFrame
(即数据框的"列")智能地显示pd.Series
:
In [5]: df2 = pd.DataFrame({'weight': {'apple': 3, 'orange':3, 'banana':2, 'grap
...: e':1, 'kiwi':2}})
In [6]: mapper = df2.squeeze() # convert to series
In [7]: df1*mapper
Out[7]:
apple banana grape kiwi orange
basket1 3 2 0 0 3
basket2 0 0 1 0 3
basket3 0 2 1 2 3
或从头开始:
In [8]: basket1 = ['apple', 'orange', 'banana']
...: basket2 = ['orange', 'grape']
...: basket3 = ['banana', 'grape', 'kiwi', 'orange']
...:
...: baskets = [basket1, basket2, basket3]
...:
In [9]: fruitvolume = {'apple': 3, 'orange':3, 'banana':2, 'grape':1, 'kiwi':2}
然后简单地说:
In [12]: data = [{item:fruitvolume[item] for item in basket} for basket in baskets]
In [13]: data
Out[13]:
[{'apple': 3, 'banana': 2, 'orange': 3},
{'grape': 1, 'orange': 3},
{'banana': 2, 'grape': 1, 'kiwi': 2, 'orange': 3}]
In [14]: pd.DataFrame(data)
Out[14]:
apple banana grape kiwi orange
0 3.0 2.0 NaN NaN 3
1 NaN NaN 1.0 NaN 3
2 NaN 2.0 1.0 2.0 3
但现在你必须做一些重复......
In [16]: df = df.fillna(0).astype(int)
In [17]: df
Out[17]:
apple banana grape kiwi orange
0 3 2 0 0 3
1 0 0 1 0 3
2 0 2 1 2 3