我有一个数据帧字典Di_N。请问如何将相同的功能应用于每个数据框?
数据帧的名称是根据数据生成的,因此未在代码中定义。
以下代码已被编辑以使用JPP的答案; “迭代字典键并依次修改每个键的数据框”:
import pandas as pd
import numpy as np
import copy
# Data
df_1 = pd.DataFrame({'Box' : [1006,1006,1006,1006,1006,1006,1007,1007,1007,1007,1008,1008,1008,1009,1009,1010,1011,1011,1012,1013],
'Item': [ 40, 41, 42, 43, 44, 45, 40, 43, 44, 45, 43, 44, 45, 40, 41, 40, 44, 45, 44, 45]})
df_Y = pd.DataFrame({'Box' : [1006,1007,1008,1009,1010,1011,1012,1013,1014],
'Type': [ 103, 101, 102, 102, 102, 103, 103, 103, 103]})
# Find whether each Box contains each Item
def is_number(s):
try:
float(s)
return 1
except ValueError:
return 0
df_1['Thing'] = df_1['Item'].apply(is_number)
# Join
df_N = df_1.set_index('Box').join(df_Y.set_index('Box', 'outer')) # Why isn't Box 1014 in df_N?
# Find how many Boxes there are of each Type
def fun(g):
try:
return float(g.shape[0])
except ZeroDivisionError:
return np.nan
df_T = df_Y.groupby('Type').apply(fun).to_frame().transpose()
# Map of Box Type
Ma_G = df_N.groupby('Type')
# Group the Boxes by Type
Di_1 = {}
for name, group in Ma_G:
Di_1[str(name)] = group
Di_2 = copy.deepcopy(Di_1)
Di_3 = {}
# Function to find the Mean of how many times each Item is in a Box
def fun(g):
try:
return float(g.shape[0])
except ZeroDivisionError:
return np.nan
numerics = ['int16', 'int32', 'int64', 'float16', 'float32', 'float64']
for k in Di_1:
# Table of which Item is in which Box
Di_2[k] = pd.pivot_table(Di_1[k], values='Thing', columns='Item', index=['Box'], aggfunc=np.sum).fillna(0)
# Find the Mean of how many times each Item is in a Box
Di_3[k] = Di_1[k] .groupby('Item') .apply(fun) .to_frame() .transpose()
Di_3[k] = (Di_3[k].loc[0] / len(Di_1[k].index)) .to_frame() .transpose()
Di_4 = copy.deepcopy(Di_2)
for k in Di_1:
# Compare each Box to the Mean - is this valid?
Di_4[k] = pd.DataFrame(Di_2[k].values - Di_3[k].values, columns=Di_2[k].columns, index=Di_2[k].index)
for c in [c for c in Di_4[k].columns if Di_4[k][c].dtype in numerics]:
Di_4[k][c] = Di_4[k][c].abs()
Di_2[k]['Unusualness'] = Di_4[k].sum(axis=1)
答案 0 :(得分:1)
只需迭代字典键并依次修改每个键的数据框。以下是一些伪代码来演示如何执行此操作:
for k in Di_N:
Di_N[k] = pd.pivot_table(Di_N[k], values='Thing', ...).fillna(0)
....
df_3 = ....
df_4 = pd.DataFrame(Di_N[k].values - .... )
Di_N[k]['Unusualness'] = df_4.sum(axis=1)
您不需要 在循环中包含一些内容,例如fun()
和numerics
的定义。将它们放在循环之外,您仍然可以在循环中引用这些对象。
此外,您可以使用pd.DataFrame.select_dtypes
选择数字列:
num_cols = df_4.select_dtypes(include=numerics).columns
df_4[num_cols] = df_4[num_cols].abs()