我有一个像这样的pandas数据框:
0 1 2 3 \
0 UICEX_0001 path/to/bam_T.bam path/to/bam_N.bam chr1:10000
1 UICEX_0002 path/to/bam_T2.bam path/to/bam_N2.bam chr54:4958392
4
0 chr4:4958392
1 NaN
我正在尝试遍历每一行并打印文本以输出另一个程序。我需要打印前三列(带有其他一些文本),然后浏览其余列并打印不同的内容,具体取决于它们是否为NaN。
这主要适用于:
当前代码
def CreateIGVBatchScript(x):
for row in x.iterrows():
print("\nnew")
sample = x[0]
bamt = x[1]
bamn = x[2]
print("\nload", bamt.to_string(index=False), "\nload", bamn.to_string(index=False))
for col in range(3, len(x.columns)):
position = x[col]
if position.isnull().values.any():
print("\n")
else:
position = position.to_string(index=False)
print("\ngoto ", position, "\ncollapse\nsnapshot ", sample.to_string(index=False), "_", position,".png\n")
CreateIGVBatchScript(data)
但输出如下:
实际输出
new
load path/to/bam_T.bam
path/to/bam_T2.bam
load path/to/bam_N.bam
path/to/bam_N2.bam
goto chr1:10000
chr54:4958392
collapse
snapshot UICEX_0001 **<-- ISSUE: it's printing both rows at the same time**
UICEX_0002 _ chr1:10000
chr54:4958392 .png
new
load path/to/bam_T.bam
path/to/bam_T2.bam
load path/to/bam_N.bam
path/to/bam_N2.bam
goto chr1:10000
chr54:4958392
collapse
snapshot UICEX_0001 **<-- ISSUE: it's printing both rows at the same time**
UICEX_0002 _ chr1:10000
chr54:4958392 .png
第一部分似乎很好,但是当我开始遍历列时,所有行都会被打印出来。我似乎无法弄清楚如何解决这个问题。这就是我希望其中一个部分看起来像:
部分通缉输出
goto chr1:10000
collapse
snapshot UICEX_0001_chr1:10000.png
goto chr54:4958392
collapse
snapshot UICEX_0001_chr54:495832.png
额外信息 顺便说一下,我实际上是想从R脚本中调整它以便更好地学习Python。这是R代码,如果有帮助:
CreateIGVBatchScript <- function(x){
for(i in 1:nrow(x)){
cat("\nnew")
sample = as.character(x[i, 1])
bamt = as.character(x[i, 2])
bamn = as.character(x[i, 3])
cat("\nload",bamt,"\nload",bamn)
for(j in 4:ncol(x)){
if(x[i, j] == "" | is.na(x[i, j])){ cat("\n") }
else{
cat("\ngoto ", as.character(x[i, j]),"\ncollapse\nsnapshot ", sample, "_", x[i,j],".png\n", sep = "")
}
}
}
cat("\nexit")
}
CreateIGVBatchScript(data)
答案 0 :(得分:0)
我想出了答案。这里有一些问题:
iterrows()
。iterrows对象实际上保存了行中的信息,然后您可以使用索引来保存该系列中的值。
for index, row in x.iterrows():
sample = row[0]
将保存第0列中该行的值。
此时,你可以使用一个简单的for循环,就像我在迭代列一样。
for col in range(3, len(data.columns)):
position = row[col]
允许您保存该列中的值。
最终的Python代码是:
def CreateIGVBatchScript(x):
x=x.fillna(value=999)
for index, row in x.iterrows():
print("\nnew", sep="")
sample = row[0]
bamt = row[1]
bamn = row[2]
print("\nload ", bamt, "\nload ", bamn, sep="")
for col in range(3, len(data.columns)):
position = row[col]
if position == 999:
print("\n")
else:
print("\ngoto ", position, "\ncollapse\nsnapshot ", sample, "_", position, ".png\n", sep="")
CreateIGVBatchScript(data)
答案由以下帖子指导: