我有一个包含80个功能(列)和4000个实例(行)的CSV文件。我想将此文件转换为更高维度(内核3)。新文件应该有88560列和4000行。我试图用一个只有3个值的飞行员。我首先用符号a,b,c准备标题然后尝试放置值但无法用值替换字符串。代码如下。
printf("%02hhx", (unsigned char)a[0]);
这会提供输出import csv
fw = open('polyexp.csv', "w")
a,b,c= sympy.symbols("a b c")
formula = (a+b+c) ** 3
poly = str(formula.expand())
ls = poly.split('+')
print >> fw, ls
value1 = ls[0]
value2 = ls[1]
value3 = ls[3]
a= 1
b=2
c =3
print value1,value2,value3
而不是a**3 3*a**2*b 3*a*b**2
。
我发布的修改后的代码正在发挥作用,但有任何优雅的方法可以做到这一点,特别是部分1)将80个功能转换为符号2)替换现在硬编码的值
1 6 12
答案 0 :(得分:1)
在您的代码中,value1,value2,value3
只是字符串,因为您执行了str(formula.expand())
。你需要再次向他们表达同情:
value1 = sympify(ls[0])
value2 = sympify(ls[1])
value3 = sympify(ls[3])
然后,您可以使用实际值替换其中的符号以获得数字结果:
print value1.subs([(a, 1), (b, 2), (c, 3)])
print value2.subs([(a, 1), (b, 2), (c, 3)])
print value3.subs([(a, 1), (b, 2), (c, 3)])
请勿将值1, 2, 3
分配给符号a, b, c
,如下所示:a = 1
。这样就丢失了符号,a
变成了一个整数1
,而不是带有该值的符号。
另请参阅tutorial。
这是一个生成80个符号的版本,但仍然相当慢
根据文档,它使用sympy.lambdify()
比subs()/evalf()
快50倍。
import sympy as sym
fw = open('polyexp.csv', "w")
fr = open('polyval.csv', "r")
flabel = open('polylabel.csv', "r")
N = 80
symbolnames = ['a'+str(i) for i in range(N)]
symbols = [sym.Symbol(name) for name in symbolnames] # Generate symbols 'a0', 'a1', ...
formula = sym.Add(*symbols) ** 3
poly = str(formula.expand())
terms = [sym.sympify(term) for term in poly.split('+')]
funcs = [sym.lambdify(symbolnames, term, 'math') for term in terms]
for line in fr:
label = flabel.readline().rstrip()
values = [float(s) for s in line.split(",")]
csvrow = [func(*values) for func in funcs]
print >> fw, label + ',' + ','.join(map(str, csvrow)
最后,这是一个使用numpy的非常快的版本,其中一个lambda函数的调用计算输出文件的整列。由于这种逐列处理,整个结果数组必须在写出之前保存在内存中。如果您没有足够的内存,可以将结果列写为行,为您提供转置的输出文件。
import numpy as np
import sympy as sym
fw = open('polyexp.csv', "w")
flabel = open('polylabel.csv', "r")
N = 80
symbolnames = ['a{:02}'.format(i) for i in range(N)]
# Try to read polynomial expansion from cache file (saves time).
polycachefile = 'poly{}.txt'.format(N)
poly = ''
try:
with open(polycachefile) as f:
poly = f.readline().strip()
except IOError:
poly = ''
if poly.count('+') > 0:
# Use cached polynomial expansion.
else:
# Calculate and save polynomial expansion.
symbols = [sym.Symbol(name) for name in symbolnames]
formula = sym.Add(*symbols) ** 3
poly = str(formula.expand())
with open(polycachefile, 'w') as f:
f.write(poly)
terms = poly.split('+')
# Read input file into val.
val = np.genfromtxt('polyval.csv', delimiter=',', \
autostrip=True, dtype=np.float32) # <-- adjust data type (i.e. precision) for calculations here!
nrows, ncols = val.shape
assert ncols == N
# Prepare efficient access to columns of val.
columns = val.T[:, :]
colsbyname = dict(zip(symbolnames, columns))
symbolnamesset = set(symbolnames)
# Create result array.
exp = np.zeros([nrows, len(terms)], dtype=val.dtype)
# Calculate result, column by column.
for i, term in enumerate(terms):
term = term.strip()
subterms = set(term.split('*'))
usedsyms = subterms & symbolnamesset
func = sym.lambdify(usedsyms, term, 'math')
exp[:, i] = func(*(colsbyname[s] for s in usedsyms))
# Write result file.
rowfmt = ','.join(['%.7g'] * len(terms)) # <-- adjust output precision here!
for row in exp:
label = flabel.readline().rstrip()
print >> fw, label + ',' + rowfmt % tuple(row)
fw.close()
性能:在我的Core i3上,计算需要35秒,写入结果文件需要8分钟才能获得17位数精度(或7位数字需要3分钟)。