我在python中得到了一个数据集,它的结构就像
Tree Species number of trunks
------------------------------
Acer rubrum 1
Quercus bicolor 1
Quercus bicolor 1
aabbccdd 0
我有一个问题,我可以实现类似于
的功能Select sum(number of trunks)
from trees.data['Number of Trunks']
where x = trees.data["Tree Species"]
group by trees.data["Tree Species"]
在python中? x是一个包含五个元素的数组:
x = array(['Acer rubrum', 'Acer saccharum', 'Acer saccharinum',
'Quercus rubra', 'Quercus bicolor'], dtype='<U16')
我想要做的是将x中的每个元素映射到trees.data [&#34; Tree Species&#34;]并计算中继数的总和,它应该返回一个数组
array = (sum_num(Acer rubrum), sum_num(Acer saccharum), sum_num(Acer saccharinum),
sum_num(Acer Quercus rubra), sum_num(Quercus bicolor))
答案 0 :(得分:2)
你想看看Python Pandas吗?这将允许你做类似
的事情property
请注意,df.groupby('Tree Species')['Number of Trunks'].sum()
是您在数据框中读取的变量名称。我建议你看看pandas和df
函数。
答案 1 :(得分:1)
您可以这样做:
import pandas as pd
df = pd.DataFrame()
tree_species = ["Acer rubrum", "Quercus bicolor", "Quercus bicolor", "aabbccdd"]
no_of_trunks = [1,1,1,0]
df["Tree Species"] = tree_species
df["Number of Trunks"] = no_of_trunks
df.groupby('Tree Species').sum() #This will create a pandas dataframe
df.groupby('Tree Species')['Number of Trunks'].sum() #This will create a pandas series.
你也可以通过使用字典来做同样的事情:
tree_species = ["Acer rubrum", "Quercus bicolor", "Quercus bicolor", "aabbccdd"]
no_of_trunks = [1,1,1,0]
d = {}
for key, trunk in zip(tree_species, no_of_trunks):
if not key in d.keys():
d[key] = 0
d[key] += trunk
print(d)