Question

我有一个包含80个功能（列）和4000个实例（行）的CSV文件。我想将此文件转换为更高维度（内核3）。新文件应该有88560列和4000行。我试图用一个只有3个值的飞行员。我首先用符号a，b，c准备标题然后尝试放置值但无法用值替换字符串。代码如下。

printf("%02hhx", (unsigned char)a[0]);

这会提供输出import csv fw = open('polyexp.csv', "w") a,b,c= sympy.symbols("a b c") formula = (a+b+c) ** 3 poly = str(formula.expand()) ls = poly.split('+') print >> fw, ls value1 = ls[0] value2 = ls[1] value3 = ls[3] a= 1 b=2 c =3 print value1,value2,value3而不是a**3 3*a**2*b 3*a*b**2。

还有其他方向可以继续吗

我发布的修改后的代码正在发挥作用，但有任何优雅的方法可以做到这一点，特别是部分1）将80个功能转换为符号2）替换现在硬编码的值

1 6 12

Answer 1

在您的代码中，value1,value2,value3只是字符串，因为您执行了str(formula.expand())。你需要再次向他们表达同情：

value1 = sympify(ls[0])
value2 = sympify(ls[1])
value3 = sympify(ls[3])

然后，您可以使用实际值替换其中的符号以获得数字结果：

print value1.subs([(a, 1), (b, 2), (c, 3)])
print value2.subs([(a, 1), (b, 2), (c, 3)])
print value3.subs([(a, 1), (b, 2), (c, 3)])

请勿将值1, 2, 3分配给符号a, b, c，如下所示：a = 1。这样就丢失了符号，a变成了一个整数1，而不是带有该值的符号。

另请参阅tutorial。

这是一个生成80个符号的版本，但仍然相当慢根据文档，它使用sympy.lambdify()比subs()/evalf()快50倍。

import sympy as sym

fw = open('polyexp.csv', "w")
fr = open('polyval.csv', "r")
flabel = open('polylabel.csv', "r")
N = 80
symbolnames = ['a'+str(i) for i in range(N)]
symbols = [sym.Symbol(name) for name in symbolnames]  # Generate symbols 'a0', 'a1', ...
formula = sym.Add(*symbols) ** 3
poly = str(formula.expand())
terms = [sym.sympify(term) for term in poly.split('+')]
funcs = [sym.lambdify(symbolnames, term, 'math') for term in terms]

for line in fr:
    label = flabel.readline().rstrip()
    values = [float(s) for s in line.split(",")]
    csvrow = [func(*values) for func in funcs]
    print >> fw, label + ',' + ','.join(map(str, csvrow)

最后，这是一个使用numpy的非常快的版本，其中一个lambda函数的调用计算输出文件的整列。由于这种逐列处理，整个结果数组必须在写出之前保存在内存中。如果您没有足够的内存，可以将结果列写为行，为您提供转置的输出文件。

import numpy as np
import sympy as sym

fw = open('polyexp.csv', "w")
flabel = open('polylabel.csv', "r")
N = 80
symbolnames = ['a{:02}'.format(i) for i in range(N)]

# Try to read polynomial expansion from cache file (saves time).
polycachefile = 'poly{}.txt'.format(N)
poly = ''
try:
    with open(polycachefile) as f:
        poly = f.readline().strip()
except IOError:
    poly = ''

if poly.count('+') > 0:
    # Use cached polynomial expansion.
else:
    # Calculate and save polynomial expansion.
    symbols = [sym.Symbol(name) for name in symbolnames]
    formula = sym.Add(*symbols) ** 3
    poly = str(formula.expand())
    with open(polycachefile, 'w') as f:
        f.write(poly)

terms = poly.split('+')
# Read input file into val.
val = np.genfromtxt('polyval.csv', delimiter=',', \
                    autostrip=True, dtype=np.float32)  # <-- adjust data type (i.e. precision) for calculations here!
nrows, ncols = val.shape
assert ncols == N
# Prepare efficient access to columns of val.
columns = val.T[:, :]
colsbyname = dict(zip(symbolnames, columns))
symbolnamesset = set(symbolnames)
# Create result array.
exp = np.zeros([nrows, len(terms)], dtype=val.dtype)

# Calculate result, column by column.
for i, term in enumerate(terms):
    term = term.strip()
    subterms = set(term.split('*'))
    usedsyms = subterms & symbolnamesset

    func = sym.lambdify(usedsyms, term, 'math')
    exp[:, i] = func(*(colsbyname[s] for s in usedsyms))

# Write result file.
rowfmt = ','.join(['%.7g'] * len(terms))  # <-- adjust output precision here!
for row in exp:
    label = flabel.readline().rstrip()
    print >> fw, label + ',' + rowfmt % tuple(row)
fw.close()

性能：在我的Core i3上，计算需要35秒，写入结果文件需要8分钟才能获得17位数精度（或7位数字需要3分钟）。

使用80,000项有效地评估80个变量的多项式

还有其他方向可以继续吗

1 个答案: