我需要将数据帧中的列传递给一个函数,该函数作为回报提供一个字典,该字典需要附加到同一个数据帧中的 2 列结果和成本中。
例如函数是:
def costsplit (acc, srv, owner, cost):
test = splitter().split(acc, srv, owner, cost)
return test
假设测试返回的数据字典类型为 test = {'dps':32, 'dd':21, 'ct':92, 'cc':32}
。
这意味着当 {'dps':32, 'dd':21, 'ct':92, 'cc':32}
被传递时,acc = dev, srv = instance owner = dpc is cost =30
被测试返回,即下面数据帧的第 1 行,同样的一些其他输出 {'dps':20, 'dd':21, 'ct':92, 'cc':2}
在 acc = prd, srv = instance, owner = abs, cost =35
时被测试返回已通过,即第 4 行,它们将附加到数据框中的结果和成本列中。
当前数据框看起来像:
date acc srv owner result cost
2021-03-01 dev bucket dps gcp.dev.dps 177
2021-03-01 prd instance abs gcp.prd.abs 35
2021-03-01 dev spanner cc gcp.dev.cc 98
2021-03-01 prd instance it gcp.prd.it 135
现在输出数据帧应该附加到字典键值对的 result
和 cost
列中。
输出应该是这样的:
date acc srv owner result cost
2021-03-01 dev bucket dps gcp.dev.dps 177
2021-03-01 prd instance abs gcp.prd.abs 35
2021-03-01 dev spanner cc gcp.dev.cc 98
2021-03-01 prd instance it gcp.prd.it 135
2021-03-01 gcp.dev.dps 32
2021-03-01 gcp.dev.dd 21
2021-03-01 gcp.dev.ct 92
2021-03-01 gcp.dev.cc 32
2021-03-01 gcp.prd.dps 20
2021-03-01 gcp.prd.dd 21
2021-03-01 gcp.prd.ct 92
2021-03-01 gcp.prd.cc 2
即循环在当前数据帧的每一行上运行,用于传递给 acc, srv, owner, cost
函数的 costsplit
列数据应附加 gcp.{acc}.{testkey}
部分中的每个 result
和 test
值被添加到 cost
返回的 test
列中。
splitter().split
函数根据从数据帧发送的每一行来划分成本并重命名所有者。
使用下面的命令,我只能附加 result
函数,而不是 cost
函数。
acc['result'] = acc.apply(lambda x: [f'gcp.{acc}.{squ}' for squ, cost in test.items()], axis=1)
答案 0 :(得分:0)
我不确定,我是否理解正确。
但我想建议一个正在进行中的解决方案。告诉我哪些沟通不清楚,我们会一起解决。 :-)
import pandas as pd
COST_DICT = {
"bucket": {"dev": {"dps": 177}},
"spanner": {"dev": {"cc": 98}},
"instance": {
"prd": {"dps": 20, "dd": 21, "ct": 92, "cc": 2, "it": 135, "abs": 35},
"dev": {"dps": 32, "dd": 21, "ct": 92, "cc": 32},
},
}
def costsplit(acc, srv, owner, previous_cost):
add_cost = COST_DICT[srv][acc][owner]
result = f"gcp.{acc}.{owner}"
cost = previous_cost + add_cost
return pd.Series({"result": result, "cost": cost})
acc_content = {
"date": ["2021-03-01", "2021-03-01", "2021-03-01", "2021-03-01"],
"acc": ["dev", "prd", "dev", "prd"],
"srv": ["bucket", "instance", "spanner", "instance"],
"owner": ["dps", "abs", "cc", "it"],
"prev_cost": [0, 0, 0, 0],
}
acc_first = pd.DataFrame(acc_content)
acc_first[["result", "cost"]] = acc_first.apply(
lambda row: costsplit(row["acc"], row["srv"], row["owner"], row["prev_cost"]), axis=1
)
# date acc srv owner prev_cost result cost
# 0 2021-03-01 dev bucket dps 0 gcp.dev.dps 177
# 1 2021-03-01 prd instance abs 0 gcp.prd.abs 35
# 2 2021-03-01 dev spanner cc 0 gcp.dev.cc 98
# 3 2021-03-01 prd instance it 0 gcp.prd.it 135
我不明白为什么您的输出数据框在 acc, srv, owner
列中为空。你不是说要遍历行,使用这些列创建result
并覆盖cost
吗?
根据您的解释,我认为最有意义的是:
acc_content = {
"date": [
"2021-03-01",
"2021-03-01",
"2021-03-01",
"2021-03-01",
"2021-03-01",
"2021-03-01",
"2021-03-01",
"2021-03-01",
],
"acc": ["dev", "dev", "dev", "dev", "prd", "prd", "prd", "prd"],
"srv": ["instance", "instance", "instance", "instance", "instance", "instance", "instance", "instance"],
"owner": ["dps", "dd", "ct", "cc", "dps", "dd", "ct", "cc"],
"prev_cost": [0, 0, 0, 0, 0, 0, 0, 0],
}
acc_second = pd.DataFrame(acc_content)
acc_second[["result", "cost"]] = acc_second.apply(
lambda row: costsplit(row["acc"], row["srv"], row["owner"], row["prev_cost"]), axis=1
)
# date acc srv owner prev_cost result cost
# 0 2021-03-01 dev instance dps 0 gcp.dev.dps 32
# 1 2021-03-01 dev instance dd 0 gcp.dev.dd 21
# 2 2021-03-01 dev instance ct 0 gcp.dev.ct 92
# 3 2021-03-01 dev instance cc 0 gcp.dev.cc 32
# 4 2021-03-01 prd instance dps 0 gcp.prd.dps 20
# 5 2021-03-01 prd instance dd 0 gcp.prd.dd 21
# 6 2021-03-01 prd instance ct 0 gcp.prd.ct 92
# 7 2021-03-01 prd instance cc 0 gcp.prd.cc 2
讨论要点:
acc=dev, acc=prd
,还有其他选择吗?srv=instance
,但您的第一个 df 还包含存储桶和扳手。我已将信息添加到成本字典中,请检查。abs
和 it
的值添加到 COST_DICT["instance"]["prd"]
,以使其适用于第一个数据帧。答案 1 :(得分:-1)
如果我正确理解了您的评论,您需要将 acc
传递到 costsplit
函数中以生成密钥。为此,您可以定义一个新的 costsplit
函数来包装现有函数 -
def new_costsplit (acc, srv, owner, cost):
test = costsplit(acc, srv, owner, cost)
return {f'gcp.{acc}.{k}': v for k, v in test.items()}
并使用这个新函数来获取您的 test_returns
-
test_returns = new_costsplit(acc, srv, owner, cost)
然后,您可以将 test_returns
的输出转换为 DataFrame
-
import pandas as pd
test_returns = {'dps':32, 'dd':21, 'ct':92, 'cc':32}
test_returns = {f'gcp.dev.{k}': v for k, v in test_returns.items()}
test_returns_df = pd.DataFrame({'result': list(test_returns.keys()), 'cost': list(test_returns.values())})
test_returns_df.index = df.index
test_returns_df
# result cost
# date
# 2021-03-01 gcp.dev.dps 32
# 2021-03-01 gcp.dev.dd 21
# 2021-03-01 gcp.dev.ct 92
# 2021-03-01 gcp.dev.cc 32
然后将其附加到您原来的 DataFrame
-
df_new = pd.concat([df, test_returns_df], axis=0)
df_new = df_new.fillna("")
df_new
# acc cost owner result srv
#date
#2021-03-01 dev 30 dps gcp.dev.dps bucket
#2021-03-01 prd 35 abs gcp.prd.abs instance
#2021-03-01 dev 98 cc gcp.dev.cc spanner
#2021-03-01 sandbox 94 it gcp.sandbox.it bigdata
#2021-03-01 32 gcp.dev.dps
#2021-03-01 21 gcp.dev.dd
#2021-03-01 92 gcp.dev.ct
#2021-03-01 32 gcp.dev.cc