我有一个具有N个键列的CSV文件,并且该列的表达式包含对键列的1到N的引用,我希望将其替换为该行的每个键列中的值。希望下面的示例阐明我的意思。
下面的关键列是A,B,C
所需的输出:
20_A
20_B
30_A
30_B
40_C_4
40_C_5
我的解决方案:
keys = ['Age','Type','Delay']
df = pd.read_csv(csv_path)
for index, row in df.iterrows():
key1_list = row[keys[0]].split(",")
key2_list = row[keys[1]].split(",")
key3_list = row[keys[2]].split(",")
expression = row['Expression']
# Iterate over all combinations of key column values and export a chart for each one
for KEY1 in key1_list:
for KEY2 in key2_list:
for KEY3 in key3_list:
string = expression
string = string.replace("<" + keys[0] + ">", KEY1)
string = string.replace("<" + keys[1] + ">", KEY2)
string = string.replace("<" + keys[2] + ">", KEY3)
print(string)
但是,我想概括一下我的代码以适用于任意数量的键列,并且只需要在开始时更新键列表。这将需要循环到深度len(keys)。但是我无法弄清楚如何使用平面代码将循环概括到任何深度,我看着itertools,但是找不到我需要的东西。我认为递归可能有效,但我更希望避免这种情况。
答案 0 :(得分:2)
递归当然可以为您解决问题,但是在走那条路之前,您应该在itertools
中进行另一番研究。您想要的是密钥的乘积,以生成所有可能的密钥组合。
一种实现方法如下:
import pandas as pd
import itertools
csv_path = "path/to/file"
df = pd.read_csv(csv_path)
# Find available keys from data frame instead of manually input it:
keys = list(df.keys()[:-1]) # Do not include "Expression" as it is not a key.
for index, row in df.iterrows():
# Add list of keys to a list of lists
# (The order needs to be preserved, therefore avoiding dict)
key_list = []
for key in keys:
# The code uses ',' as value separator in each cell.
# Does this work in a csv file?
key_list.append(list(row[key].split(',')))
expression = row['Expression']
# All key combinations are then generated with 'itertools.product'
combos = itertools.product(*key_list)
# Each combo is then handled separately
for combo in combos:
string = expression
# Replace each key in order
# Must be done sequentially since depth is not known/variable
for key, value in zip(keys, combo):
string = string.replace('<' + key + '>', value)
print(string)
希望这段代码是可以理解的,并且可以实现您想要的功能。否则,请告知我,我将尝试进一步澄清。