存储相关矩阵的上/下半部分

时间:2019-03-20 06:07:24

标签: python matrix

将存储相关矩阵的上/下半部分到文件的最{{1)内存高效(2)时间高效(3)易于访问*}的方式是什么在 python 中?
(通过“易于访问”,我的意思是能够从文件中读取并使用matplotlib / seaborn绘制相关矩阵) 例如,下面的相关矩阵:

SQL> create table t ( d date, o int, c int, r int, b int );

Table created.

SQL> insert into t values ( date '2000-03-20',   100   ,     20   ,      30   ,     90);

1 row created.

SQL> insert into t values ( date '2000-03-19',    null   ,     15   ,      20   ,     null);

1 row created.

SQL> insert into t values ( date '2000-03-18',    null   ,     25   ,      30   ,     null);

1 row created.

SQL>
SQL> select
  2   d,
  3   case
  4     when row_number() over ( order by d desc ) = 1 then o
  5     else min(o) over () +tot_c-tot_r
  6   end o,
  7   c,
  8   r
  9  from (
 10    select t.*,
 11           nvl(sum(c) over ( order by d desc rows between unbounded preceding and 1 preceding),c) as tot_c,
 12           nvl(sum(r) over ( order by d desc rows between unbounded preceding and 1 preceding ),r) as tot_r
 13    from t
 14  );

D                  O          C          R
--------- ---------- ---------- ----------
20-MAR-00        100         20         30
19-MAR-00         90         15         20
18-MAR-00         85         25         30

3 rows selected.

我要将以下数字存储到文件中。

    C1   C2   C3   C4
C1  1.0  0.6  0.7  0.5  
C2  0.6  1.0  0.4  0.9  
C3  0.7  0.4  1.0  0.3
C4  0.5  0.9  0.3  1.0

OR

    C2   C3   C4
C1  0.6  0.7  0.5
C2       0.4  0.9
C3            0.3

(我想将其存储为一个csv / tsv文件,但是它仍然会占用空白字符的内存,而空白字符将存储在矩阵的另一半。)

2 个答案:

答案 0 :(得分:1)

您需要这样的东西:

matrix = np.array([[1, 0.6, 0.7, 0.5],
          [0.6, 1, 0.4, 0.9],
          [0.7, 0.4, 1, 0.3],
          [0.5, 0.9, 0.3, 1]])

ut = np.triu(matrix, k=1)
lt = np.tril(matrix, k=-1)

ut = np.where(ut==0, np.nan, ut)
lt = np.where(lt==0, np.nan, lt)

np.savetxt("upper.csv", ut, delimiter=",")
np.savetxt("lower.csv", lt, delimiter=",")

答案 1 :(得分:1)

使用第二种表示形式。它只是前者的转置,而另一半则不需要存储任何空白字符。如果您担心空白字符,请为矩阵编写自定义文件编写器/读取器。

示例:

mat = []

mat.append(["C1", "C2", "C3"])
mat.append(["C2", 0.6])
mat.append(["C3", 0.7, 0.4])
mat.append(["C4", 0.5, 0.9, 0.3])

print(mat)

with open("correlation.txt", "w") as _file:
    for row in mat:
        _file.write("\t".join(str(val) for val in row))
        _file.write("\n") # you will not have blank characters

with open("correlation.txt", "r") as _file:
    for line in _file.readlines():
        print(len(line.split()))

结果:

  

[['C1','C2','C3'],['C2',0.6],['C3',0.7,0.4],['C4',0.5,0.9,0.3]]
  3
  2
  3
  4