Question

注意：这是Python 2.7，而不是Py3

这是询问先前问题的更新尝试。您请求了我的完整代码，内容说明和示例输出文件。我会尽力把这个格式化好。

此代码用于从荧光测量仪读取输入文件＆＃34;并将读数转换为DNA浓度和质量。然后它生成一个按照8x12平板方案（DNA /分子工作标准）组织的输出文件。行标记为＆＃34; A，B，C，...，H＆＃34;和列标记为1 - 12。

根据用户输入，需要堆叠数组以格式化输出。但是，当数组在UNIX中堆叠（并打印或写入outfile）时，它们仅限于第一个字符。

换句话说，在Windows中，如果数组中的数字是247.5，则会打印完整数字。但是在UNIX环境（Linux / Ubuntu / MacOS）中，它被简单地截断为＆＃34; 2＆＃34;。 -2.7的数字将在Windows中正常打印，但在UNIX中只打印为＆＃34; - ＆＃34;。

完整的代码可以在下面找到; 请注意，最后一个块是代码中最相关的部分：

#!/usr/bin/env python

Usage = """
plate_calc.py - version 1.0

Convert a series of plate fluorescence readings
to total DNA mass per sample and print them to 
a tab-delimited output file.

This program can take multiple files as inputs
(separated by a space) and generates a new
output file for each input file.

NOTE: 

1) Input(s) must be an exported .txt file.
2) Standards must be in columns 1 and 2, or 11 
and 12.
3) The program assumes equal volumes across wells.

Usage:

    plate_calc.py input.txt input2.txt input3.txt
"""

import sys
import numpy as np

if len(sys.argv)<2:
    print Usage
else:
#First, we want to extract the values of interest into a Numpy array
    Filelist = sys.argv[1:]
    input_DNA_vol = raw_input("Volume of sample used for AccuClear reading (uL): ")
    remainder_vol = raw_input("Remaining volume per sample (uL): ")
    orientation = raw_input("Are the standards on the LEFT (col. 1 & 2), or on the RIGHT (col. 11 and 12)? ")
    orientation = orientation.lower()
    for InfileName in Filelist:
        with open(InfileName) as Infile:
            fluor_list = []
            Linenumber = 1
            for line in Infile: #this will extract the relevant information and store as a list of lists
                if Linenumber == 5:
                    line = line.strip('\n').strip('\r').strip('\t').split('\t')
                    fluor_list.append(line[1:])
                elif Linenumber > 5 and Linenumber < 13:
                    line = line.strip('\n').strip('\r').strip('\t').split('\t')
                    fluor_list.append(line)
                Linenumber += 1
            fluor_list = [map(float, x) for x in fluor_list] #converts list items from strings to floats
            fluor_array = np.asarray(fluor_list) #this takes our list of lists and converts it to a numpy array

代码的这一部分（上面）从输入文件（从板读取器获得）中提取感兴趣的值并将它们转换为数组。它还需要用户输入来获取计算和转换的信息，并确定标准所在的列。

最后一部分在数组堆叠时发挥作用 - 这就是问题行为发生的地方。

        #Create conditional statement, depending on where the standards are, to split the array
        if orientation == "right":
            #Next, we want to average the 11th and 12th values of each of the 8 rows in our numpy array 
            stds = fluor_array[:,[10,11]] #Creates a sub-array with the standard values (last two columns, (8x2))
            data = np.delete(fluor_array,(10,11),axis=1) #Creates a sub-array with the data (first 10 columns, (8x10))

        elif orientation == "left":
            #Next, we want to average the 1st and 2nd values of each of the 8 rows in our numpy array   
            stds = fluor_array[:,[0,1]] #Creates a sub-array with the standard values (first two columns, (8x2))
            data = np.delete(fluor_array,(0,1),axis=1) #Creates a sub-array with the data (last 10 columns, (8x10))

        else:
            print "Error: answer must be 'LEFT' or 'RIGHT'"

        std_av = np.mean(stds, axis=1) #creates an array of our averaged std values

        #Then, we want to subtract the average value from row 1 (the BLANK) from each of the 8 averages (above)
        std_av_st = std_av - std_av[0]

        #Run a linear regression on the points in std_av_st against known concentration values (these data = y axis, need x axis)
        x = np.array([0.00, 0.03, 0.10, 0.30, 1.00, 3.00, 10.00, 25.00])*10 #ng/uL*10 = ng/well
        xi = np.vstack([x, np.zeros(len(x))]).T #creates new array of (x, 0) values (for the regression only); also ensures a zero-intercept (when we use (x, 1) values, the y-intercept is not forced to be zero, and the slope is slightly inflated)
        m, c = np.linalg.lstsq(xi, std_av_st)[0] # m = slope for future calculations

        #Now we want to subtract the average value from row 1 of std_av (the average BLANK value) from all data points in "data"
        data_minus_blank = data - std_av[0]

        #Now we want to divide each number in our "data" array by the value "m" derived above (to get total ng/well for each sample; y/m = x)
        ng_per_well = data_minus_blank/m

        #Now we need to account for the volume of sample put in to the AccuClear reading to calculate ng/uL
        ng_per_microliter = ng_per_well/float(input_DNA_vol)

        #Next, we multiply those values by the volume of DNA sample (variable "ng")
        ng_total = ng_per_microliter*float(remainder_vol)

        #Set number of decimal places to 1
        ng_per_microliter = np.around(ng_per_microliter, decimals=1)
        ng_total = np.around(ng_total, decimals=1)

上述代码进行必要的计算，根据DNA标准的线性回归计算出给定样品中DNA的浓度（ng / uL）和总质量（ng），＆＃34;它可以在第1列和第2列（用户输入=＆＃34;左和＃34;）或第11和第12列（用户输入=＆＃34;右和＃34;）。

        #Create a row array (values A-H), and a filler array ('-') to add to existing arrays
        col = [i for i in range(1,13)]
        row = np.asarray(['A','B','C','D','E','F','G','H'])
        filler = np.array(['-','-','-','-','-','-','-','-','-','-','-','-','-','-','-','-',]).reshape((8,2))

上面的代码创建了与原始数组堆叠的数组。＆＃34;填充物＆＃34;数组是根据＆＃34;右＆＃34;的用户输入放置的。或＆＃34;左＆＃34; （堆叠命令，np.c_ []，如下所示）。

        #Create output
        Outfile = open('Total_DNA_{0}'.format(InfileName),"w")
        Outfile.write("DNA concentration (ng/uL):\n\n")
        Outfile.write("\t"+"\t".join([str(n) for n in col])+"\n")
        if orientation == "left": #Add filler to left, then add row to the left of filler
            ng_per_microliter = np.c_[filler,ng_per_microliter]
            ng_per_microliter = np.c_[row,ng_per_microliter]
            Outfile.write("\n".join(["\t".join([n for n in item]) for item in ng_per_microliter.tolist()])+"\n\n")
        elif orientation == "right": #Add rows to the left, and filler to the right
            ng_per_microliter = np.c_[row,ng_per_microliter]
            ng_per_microliter = np.c_[ng_per_microliter,filler]
            Outfile.write("\n".join(["\t".join([n for n in item]) for item in ng_per_microliter.tolist()])+"\n\n")
        Outfile.write("Total mass of DNA per sample (ng):\n\n")
        Outfile.write("\t"+"\t".join([str(n) for n in col])+"\n")
        if orientation == "left":
            ng_total = np.c_[filler,ng_total]
            ng_total = np.c_[row,ng_total]
            Outfile.write("\n".join(["\t".join([n for n in item]) for item in ng_total.tolist()]))
        elif orientation == "right":
            ng_total = np.c_[row,ng_total]
            ng_total = np.c_[ng_total,filler]
            Outfile.write("\n".join(["\t".join([n for n in item]) for item in ng_total.tolist()]))
        Outfile.close

最后，我们生成了输出文件。这就是出现问题行为的地方。

使用简单的打印命令，我发现堆栈命令numpy.c_ []是罪魁祸首（不是数组写入命令）。

所以似乎 numpy.c_ [] 不会在Windows中截断这些数字，但会将这些数字限制为UNIX环境中的第一个字符。

哪些替代方案可能适用于两个平台？如果不存在，我不介意制作特定于UNIX的脚本。

感谢大家的帮助和耐心。很抱歉没有提供所有必要的信息。

图片是显示Windows正确输出的屏幕截图以及我最终在UNIX中获得的内容（我试图为您设置这些格式......但它们是一场噩梦）。当我只是打印数组时，我还包括了终端中获得的输出的截图＆＃34; ng_per_microliter＆＃34;和＆＃34; ng_total。＆＃34;

Answer 1

使用简单的打印命令，我发现堆栈命令numpy.c_ []是罪魁祸首（不是数组写入命令）。

因此，numpy.c_ []似乎不会在Windows中截断这些数字，但会将这些数字限制为UNIX环境中的第一个字符。

在简单的示例中说明这些陈述。 np.c_[]不应做任何不同的事情。

在Py3中，unicode中的默认字符串类型。 numpy 1.12

In [149]: col = [i for i in range(1,13)]
     ...: row = np.asarray(['A','B','C','D','E','F','G','H'])
     ...: filler = np.array(['-','-','-','-','-','-','-','-','-','-','-','-','-','-','-','-',]).reshape((8,2))
     ...: 
In [150]: col
Out[150]: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]
In [151]: "\t"+"\t".join([str(n) for n in col])+"\n"
Out[151]: '\t1\t2\t3\t4\t5\t6\t7\t8\t9\t10\t11\t12\n'
In [152]: filler
Out[152]: 
array([['-', '-'],
       ...
       ['-', '-'],
       ['-', '-']], 
      dtype='<U1')
In [153]: row
Out[153]: 
array(['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H'], 
      dtype='<U1')
In [154]: row.shape
Out[154]: (8,)
In [155]: filler.shape
Out[155]: (8, 2)
In [159]: ng_per_microliter=np.arange(8.)+1.23
In [160]: np.c_[filler,ng_per_microliter]
Out[160]: 
array([['-', '-', '1.23'],
       ['-', '-', '2.23'],
       ['-', '-', '3.23'],
      ...
       ['-', '-', '8.23']], 
      dtype='<U32')
In [161]: np.c_[row,ng_per_microliter]
Out[161]: 
array([['A', '1.23'],
       ['B', '2.23'],
       ['C', '3.23'],
         ....
       ['H', '8.23']], 
      dtype='<U32')

对于早期的numpy版本，U1（或Py2中的S1）数组与数值的连接可能会使dtype保留为U1。在我的例子中，他们已经扩展到U32。

因此，如果您怀疑np.c_，请显示结果（如果需要，请显示repr）

print(repr(np.c_[row,ng_per_microliter]))

并跟踪dtype。

for v 1.12发行说明（可能更早）

如果要转换为的字符串dtype在“安全”转换模式下不足以保存正在转换的整数/浮点数组的最大值，则astype方法现在返回错误。以前，即使结果被截断，也允许施法。

这可能在连接时发挥作用。

Answer 2

在用户hpaulj的帮助下，我发现这不是操作系统和环境之间不同行为的问题。这很可能是因为用户拥有不同版本的numpy。

数组的连接自动将'float64'dtypes转换为'S1'（以匹配“填充”数组（' - '）和“行”数组（'A'，'B'等）。 / p>

较新版本的numpy - 特别是v 1.12.X - 似乎允许在没有这种自动转换的情况下连接数组。

我仍然不确定在旧版本的numpy中解决此问题的方法，但建议人们升级其版本以获得完整性能应该是一件简单的事情。：）

在Numpy中堆叠数组：UNIX和Windows之间的不同行为

2 个答案: