编辑:我现在有一个问题的候选解决方案(请参阅下面的玩具示例) - 如果您能想到更强大的内容,请告诉我。
我刚刚发现了python的patsy包,用于从R风格的公式创建设计矩阵,它看起来很棒。我的问题是:给出一个公式,例如"(x1 + x2 + x3)** 2",是否有一种简单的方法可以创建一个包含与特定变量相关的导数的设计矩阵,例如: " X 1"
这是一个玩具示例:
import numpy as np
import pandas as pd
import patsy
import sympy
import sympy.parsing.sympy_parser as sympy_parser
n_obs = 200
df = pd.DataFrame(np.random.uniform(size=(n_obs, 3)), columns=["x1", "x2", "x3"])
df.describe()
design_matrix = patsy.dmatrix("(I(7*x1) + x2 + x3)**2 + I(x1**2) + I(x1*x2*x3)", df)
design_matrix.design_info.column_names
## ['Intercept', 'I(7 * x1)', 'x2', 'x3', 'I(7 * x1):x2', 'I(7 * x1):x3', 'x2:x3', 'I(x1 ** 2)', 'I(x1 * x2 * x3)']
x1, x2, x3 = sympy.symbols("x1 x2 x3")
def diff_wrt_x1(string):
return str(sympy.diff(sympy_parser.parse_expr(string), x1))
colnames_to_differentiate = [colname.replace(":", "*").replace("Intercept", "1").replace("I", "")
for colname in design_matrix.design_info.column_names]
derivatives_wrt_x1 = [diff_wrt_x1(colname) for colname in colnames_to_differentiate]
def get_column(string):
try:
return float(string) * np.ones((len(df), 1)) # For cases like string == "7"
except ValueError:
return patsy.dmatrix("0 + I(%s)" % string, df) # For cases like string == "x2*x3"
derivative_columns = tuple(get_column(derivative_string) for derivative_string in derivatives_wrt_x1)
design_matrix_derivative = np.hstack(derivative_columns)
design_matrix_derivative[0] # Contains [0, 7, 0, 0, 7*x2, 7*x3, 0, 2*x1, x2*x3]
design_matrix_derivative_manual = np.zeros_like(design_matrix_derivative)
design_matrix_derivative_manual[:, 1] = 7.0
design_matrix_derivative_manual[:, 4] = 7*df["x2"]
design_matrix_derivative_manual[:, 5] = 7*df["x3"]
design_matrix_derivative_manual[:, 7] = 2*df["x1"]
design_matrix_derivative_manual[:, 8] = df["x2"] * df["x3"]
np.all(np.isclose(design_matrix_derivative, design_matrix_derivative_manual)) # True!
代码生成一个包含[1, 7*x1, x2, x3, 7*x1*x2, 7*x1*x3, x2*x3, x1^2, x1*x2*x3]
列的设计矩阵。
假设我想要一个新的公式,它将design_matrix与x1区分开来。期望的结果是与design_matrix
形状相同的矩阵,但其列为[0, 7, 0, 0, 7*x2, 7*x3, 0, 2*x1, x2*x3]
。有没有一种程序化的方法来做到这一点?我已经尝试过搜索patsy文档以及stackoverflow,但我什么也看不见。当然我可以手动创建衍生矩阵,但是有一个功能可以很好地完成它(当我将公式更改为"(x1 + x2 + x3 + x4)**2 + I(x1**3)"
时,不必更新)。