我正在编写一个脚本,通过pandas将CSV中的值读入DataFrame。值“A”和“B”是等式的输入。该等式是从外部程序的XML输出文件中获得的。该等式为DataFrame逐行提供“A”和“B”的结果,并将这些结果放回原始DataFrame中。
如果我做了一个函数定义,明确地在定义中写出等式,并返回那个等式,事情就好了。如,
import pandas as pd
dataFrame = pd.read_csv() # Reads CSV to "dataFrame"
A = dataFrame['A'] # Defines A as row A in "dataFrame"
B = dataFrame['B'] # Defines B as row B in "dataFrame"
def Func(a,b):
P = 2*a+3*b
return P
outPut['P'] = Func(A, B) # Assigns a value to each row in "outPut" for each 'A' and 'B' per row of "dataFrame"
然而,我真正想做的是从XML文件“构建”相同的等式,而不是明确地输入它。所以,我基本上从xml文件中提取“术语”和“系数”,并得到方程式的字符串形式。然后我使用 sympy.sympify()将字符串转换为可执行函数。例如,
import pandas as pd
import sympy as sy
import xml.etree.ElementTree as etree
dataFrame = pd.read_csv() # Reads CSV to "dataFrame"
A = dataFrame['A'] # Defines A as row A in "dataFrame"
B = dataFrame['B'] # Defines B as row B in "dataFrame"
tree = etree.parse('C:\...')
.
..(some XML stuff with etree)
.
equationString = "some code that grabs terms and coefficients from XML file" # Builds equation from XML 'terms' and 'coefficients'
P = sy.sympify(equationString)
def Func(A, B):
global P
return P
outPut['P'] = Func(A, B) # Assigns a value to each row in "outPut" for each 'A' and 'B' per row of "dataFrame"
结果是当我调用在dataFrame上执行这个等式时,文字方程被复制到“outPut”DF而不是每个'A'和'B'的逐行结果。我不明白为什么Python会以不同的方式看待这些代码示例,也不了解如何从第一个示例中获得我想要的结果。由于某种原因, sympify()结果不可执行。当我使用 eval()时,似乎也会出现同样的情况。
答案 0 :(得分:1)
阐述我的评论,以下是如何使用lambdify
In [1]: import sympy as sp
In [2]: import pandas as pd
In [3]: import numpy as np
In [4]: df = pd.DataFrame(np.random.randn(5,2), columns=['A', 'B'])
In [5]: equationString = "2*A+3*B"
In [7]: expr = sp.S(equationString)
In [8]: expr
Out[8]: 2*A + 3*B
In [10]: f = sp.lambdify(sp.symbols("A B"), expr, modules="numpy")
In [11]: f(df['A'],df['B'])
Out[11]:
0 -2.779739
1 -1.176580
2 3.911066
3 1.888639
4 0.745293
dtype: float64
In [12]: 2*df["A"]+3*df["B"] - f(df["A"],df["B"])
Out[12]:
0 0
1 0
2 0
3 0
4 0
dtype: float64
根据xml文件中遇到的表达式,sympy可能有点矫枉过正。以下是eval
In [1]: import pandas as pd
In [2]: import numpy as np
In [3]: df = pd.DataFrame(np.ran
np.random np.rank
In [3]: df = pd.DataFrame(np.random.randn(5, 2), columns=['A', 'B'])
In [4]: equationString = "2*A+3*B"
In [5]: f = eval("lambda A, B: "+equationString)
In [6]: f(df['A'],df['B'])
Out[6]:
0 1.094797
1 -1.942295
2 -5.181502
3 1.888990
4 3.069017
dtype: float64
In [7]: 2*df["A"]+3*df["B"] - f(df["A"],df["B"])
Out[7]:
0 0
1 0
2 0
3 0
4 0
dtype: float64