Python3 - 循环遍历行,然后是一些列来打印文本

时间:2016-10-14 19:24:38

标签: python pandas

我有一个像这样的pandas数据框:

            0                   1                   2              3  \
0  UICEX_0001   path/to/bam_T.bam   path/to/bam_N.bam     chr1:10000   
1  UICEX_0002  path/to/bam_T2.bam  path/to/bam_N2.bam  chr54:4958392   

              4  
0  chr4:4958392  
1           NaN 

我正在尝试遍历每一行并打印文本以输出另一个程序。我需要打印前三列(带有其他一些文本),然后浏览其余列并打印不同的内容,具体取决于它们是否为NaN。

这主要适用于:

当前代码

def CreateIGVBatchScript(x):
    for row in x.iterrows():
        print("\nnew")
        sample = x[0]
        bamt = x[1]
        bamn = x[2]
        print("\nload", bamt.to_string(index=False), "\nload", bamn.to_string(index=False))
        for col in range(3, len(x.columns)):
            position = x[col]
            if position.isnull().values.any():
                print("\n")
            else:
                position = position.to_string(index=False)
                print("\ngoto ", position, "\ncollapse\nsnapshot ", sample.to_string(index=False), "_", position,".png\n")

CreateIGVBatchScript(data)

但输出如下:

实际输出

new
load path/to/bam_T.bam
path/to/bam_T2.bam 
load path/to/bam_N.bam
path/to/bam_N2.bam

goto  chr1:10000
chr54:4958392 
collapse
snapshot  UICEX_0001 **<-- ISSUE: it's printing both rows at the same time**
UICEX_0002 _ chr1:10000
chr54:4958392 .png

new

load path/to/bam_T.bam
path/to/bam_T2.bam 
load path/to/bam_N.bam
path/to/bam_N2.bam

goto  chr1:10000
chr54:4958392 
collapse
snapshot  UICEX_0001   **<-- ISSUE: it's printing both rows at the same time**
UICEX_0002 _ chr1:10000
chr54:4958392 .png

第一部分似乎很好,但是当我开始遍历列时,所有行都会被打印出来。我似乎无法弄清楚如何解决这个问题。这就是我希望其中一个部分看起来像:

部分通缉输出

goto chr1:10000
collapse
snapshot UICEX_0001_chr1:10000.png
goto chr54:4958392
collapse
snapshot UICEX_0001_chr54:495832.png

额外信息 顺便说一下,我实际上是想从R脚本中调整它以便更好地学习Python。这是R代码,如果有帮助:

CreateIGVBatchScript <- function(x){
     for(i in 1:nrow(x)){
          cat("\nnew")
          sample = as.character(x[i, 1])
          bamt = as.character(x[i, 2])
          bamn = as.character(x[i, 3])
          cat("\nload",bamt,"\nload",bamn)
          for(j in 4:ncol(x)){
               if(x[i, j] == "" | is.na(x[i, j])){ cat("\n") }
               else{
                    cat("\ngoto ", as.character(x[i, j]),"\ncollapse\nsnapshot ", sample, "_", x[i,j],".png\n", sep = "")
               }
          }
     }
     cat("\nexit")
}
CreateIGVBatchScript(data)

1 个答案:

答案 0 :(得分:0)

我想出了答案。这里有一些问题:

  1. 我错误地使用了iterrows()
  2. iterrows对象实际上保存了行中的信息,然后您可以使用索引来保存该系列中的值。

    for index, row in x.iterrows():
        sample = row[0]
    

    将保存第0列中该行的值。

    1. 迭代列
    2. 此时,你可以使用一个简单的for循环,就像我在迭代列一样。

      for col in range(3, len(data.columns)):
          position = row[col]
      

      允许您保存该列中的值。

      最终的Python代码是:

      def CreateIGVBatchScript(x):
          x=x.fillna(value=999)
          for index, row in x.iterrows():
              print("\nnew", sep="")
              sample = row[0]
              bamt = row[1]
              bamn = row[2]
              print("\nload ", bamt, "\nload ", bamn, sep="")
              for col in range(3, len(data.columns)):
                  position = row[col]
                  if position == 999:
                      print("\n")
                  else:
                      print("\ngoto ", position, "\ncollapse\nsnapshot ", sample, "_", position, ".png\n", sep="")
      
      CreateIGVBatchScript(data)
      

      答案由以下帖子指导: