Question

我有一个像这样的pandas数据框：

            0                   1                   2              3  \
0  UICEX_0001   path/to/bam_T.bam   path/to/bam_N.bam     chr1:10000   
1  UICEX_0002  path/to/bam_T2.bam  path/to/bam_N2.bam  chr54:4958392   

              4  
0  chr4:4958392  
1           NaN

我正在尝试遍历每一行并打印文本以输出另一个程序。我需要打印前三列（带有其他一些文本），然后浏览其余列并打印不同的内容，具体取决于它们是否为NaN。

这主要适用于：

当前代码

def CreateIGVBatchScript(x):
    for row in x.iterrows():
        print("\nnew")
        sample = x[0]
        bamt = x[1]
        bamn = x[2]
        print("\nload", bamt.to_string(index=False), "\nload", bamn.to_string(index=False))
        for col in range(3, len(x.columns)):
            position = x[col]
            if position.isnull().values.any():
                print("\n")
            else:
                position = position.to_string(index=False)
                print("\ngoto ", position, "\ncollapse\nsnapshot ", sample.to_string(index=False), "_", position,".png\n")

CreateIGVBatchScript(data)

但输出如下：

实际输出

new
load path/to/bam_T.bam
path/to/bam_T2.bam 
load path/to/bam_N.bam
path/to/bam_N2.bam

goto  chr1:10000
chr54:4958392 
collapse
snapshot  UICEX_0001 **<-- ISSUE: it's printing both rows at the same time**
UICEX_0002 _ chr1:10000
chr54:4958392 .png

new

load path/to/bam_T.bam
path/to/bam_T2.bam 
load path/to/bam_N.bam
path/to/bam_N2.bam

goto  chr1:10000
chr54:4958392 
collapse
snapshot  UICEX_0001   **<-- ISSUE: it's printing both rows at the same time**
UICEX_0002 _ chr1:10000
chr54:4958392 .png

第一部分似乎很好，但是当我开始遍历列时，所有行都会被打印出来。我似乎无法弄清楚如何解决这个问题。这就是我希望其中一个部分看起来像：

部分通缉输出

goto chr1:10000
collapse
snapshot UICEX_0001_chr1:10000.png
goto chr54:4958392
collapse
snapshot UICEX_0001_chr54:495832.png

额外信息 顺便说一下，我实际上是想从R脚本中调整它以便更好地学习Python。这是R代码，如果有帮助：

CreateIGVBatchScript <- function(x){
     for(i in 1:nrow(x)){
          cat("\nnew")
          sample = as.character(x[i, 1])
          bamt = as.character(x[i, 2])
          bamn = as.character(x[i, 3])
          cat("\nload",bamt,"\nload",bamn)
          for(j in 4:ncol(x)){
               if(x[i, j] == "" | is.na(x[i, j])){ cat("\n") }
               else{
                    cat("\ngoto ", as.character(x[i, j]),"\ncollapse\nsnapshot ", sample, "_", x[i,j],".png\n", sep = "")
               }
          }
     }
     cat("\nexit")
}
CreateIGVBatchScript(data)

Answer 1

我想出了答案。这里有一些问题：

我错误地使用了iterrows()。

iterrows对象实际上保存了行中的信息，然后您可以使用索引来保存该系列中的值。

for index, row in x.iterrows():
    sample = row[0]

将保存第0列中该行的值。

迭代列

此时，你可以使用一个简单的for循环，就像我在迭代列一样。

for col in range(3, len(data.columns)):
    position = row[col]

允许您保存该列中的值。

最终的Python代码是：

def CreateIGVBatchScript(x):
    x=x.fillna(value=999)
    for index, row in x.iterrows():
        print("\nnew", sep="")
        sample = row[0]
        bamt = row[1]
        bamn = row[2]
        print("\nload ", bamt, "\nload ", bamn, sep="")
        for col in range(3, len(data.columns)):
            position = row[col]
            if position == 999:
                print("\n")
            else:
                print("\ngoto ", position, "\ncollapse\nsnapshot ", sample, "_", position, ".png\n", sep="")

CreateIGVBatchScript(data)

答案由以下帖子指导：

Python3 - 循环遍历行，然后是一些列来打印文本

1 个答案: