Question

我有这个数据框：

d = {'city':['Barcelona','Madrid','Rome','Torino','London','Liverpool','Manchester','Paris'],
'country': ['ES','ES','IT','IT','UK','UK','UK','FR'],
'revenue': [1,2,3,4,5,6,7,8],
'amount': [8,7,6,5,4,3,2,1]
df = pd.DataFrame(d)

我想为每个国家/地区获取此信息

españa = {'city':['Barcelona','Madrid']
          'revenue':[1,2]
          'amount':[8,7]}
 ES = pd.DataFrame(españa)

最后我将有4个数据框，分别为ES，IT，UK和FR。

到目前为止，我已经尝试过：

a = set(df.loc[:]["country"])
for country in a:
    country = df.loc[(df["country"]== country),['date','sum']]

但是那只给了我一个带有一个值的数据框。

Answer 1

您可以对groupby使用字典理解：

res = {k: v.drop('country', 1) for k, v in df.groupby('country')}

print(res)

{'ES':    amount       city  revenue
       0       8  Barcelona        1
       1       7     Madrid        2,
 'FR':    amount   city  revenue
       7       1  Paris        8,
 'IT':    amount    city  revenue
       2       6    Rome        3
       3       5  Torino        4,
 'UK':    amount        city  revenue
       4       4      London        5
       5       3   Liverpool        6
       6       2  Manchester        7}

Answer 2

Country是一个被覆盖的迭代器变量。

为了生成4个不同的数据帧，请尝试使用生成器函数。

def country_df_generator(data): for country in data['country']unique(): yield df.loc[(df["country"]== country), ['date','sum']] countries = country_df_generator(data)

Answer 3

该循环为您提供了全部四个数据帧，但是您将前三个数据帧扔进了垃圾桶。

您使用变量a遍历country，但是在下一个语句country = ...中销毁了该值。然后返回循环顶部，将country重设为下一个两个字母的缩写，并在所有四个国家/地区继续这种冲突。

如果需要四个数据帧，则需要将每个数据帧放在单独的位置。例如：

a = set(df.loc[:]["country"])
df_dict = {}

for country in a:
    df_dict[country] = df.loc[(df["country"]== country),['date','sum']]

现在，您有一本包含四个数据框的字典，每个字典均由其国家/地区代码索引。有帮助吗？

如何创建遍历集合的数据框？

3 个答案: