Question

我有以下程序：

ui <- fluidPage(


   sidebarLayout(

  selectInput("Var1",
     label = "Variable", #DATA CHOICE 1
     selected = 10,
     choices = c(10:100)),

   selectInput("Var1",
               label = "Variable2", #DATA CHOICE 2
               selected = 10,
               choices = c(10:100))

      # Show a plot of the generated distribution

   ),
  mainPanel(
    plotlyOutput('plot') #Draw figure
  )
)


server <- function(input, output) {

  out <- reactive({

    data.frame(x = rnorm(input$Var1), #Build data set 1
               y  = 1:input$Var1)

  })

  out2 <- reactive({
    data.frame(x = rnorm(input$Var2), #Build data set 2
               y  = 1:input$Var2)
  })

  output$plot <- renderPlotly({
    p <- ggplot() +
      geom_line(data = out(), aes(x = x, y = y)) #Add both data sets in one ggplot
      geom_line(data = out2(), aes(x = x, y = y), color = "red")

    ggplotly(p)
  })

}

# Run the application 
shinyApp(ui = ui, server = server)

将使用以下inout文件：

from collections import Counter
counter=0
lst=list()
fhandle=open('DNAInput.txt','r')
for line in fhandle:
    if line.startswith('>'):
       continue
    else:
       lst.append(line)
while counter != len(lst[0]):
    lst2=list()
    for word in lst:
        lst2.append(word[counter])
    for letter in lst2:
        mc=Counter(lst).most_common(5)
    counter=counter +1
    print(mc)

并打印出每个Collin中重复次数最多的字母。如何在没有“从集合导入计数器”的情况下制作完全相同的文件

Answer 1

如果我了解您要做什么，在每一列中找到最常见的字符（？），您可以按照以下步骤进行操作：

def most_common(col, exclude_char='N'):
    col = list(filter((exclude_char).__ne__, col))
    return max(set(col), key=col.count)

sequences = []
with open('DNAinput.txt', 'r') as file:
    for line in file:
        if line[0] == '>':
            continue
        else:
            sequences.append(line.strip())

m = max([len(v) for v in sequences])
matrix = [list(v) for v in sequences]
for seq in matrix:
    seq.extend(list('N' * (m - len(seq))))
transposed_matrix = [[matrix[j][i] for j in range(len(matrix))] for i in range(m)] 

for column in transposed_matrix:
    print(most_common(column))

此方法的工作原理：

打开文件，并将其读入list，如下所示：

# This is the `sequences` list
['GATCA', 'AATC', 'AATA', 'ACTA']

获取最长的DNA序列的长度：

# m = max([len(v) for v in sequences])
5

根据以下顺序创建矩阵（列表列表）：

# matrix = [list(v) for v in sequences]
[['G', 'A', 'T', 'C', 'A'],
 ['A', 'A', 'T', 'C'],
 ['A', 'A', 'T', 'A'],
 ['A', 'C', 'T', 'A']]

填充矩阵，以便所有序列具有相同的长度：

# for seq in matrix:
#     seq.extend(list('N' * (m - len(seq))))
[['G', 'A', 'T', 'C', 'A'],
 ['A', 'A', 'T', 'C', 'N'],
 ['A', 'A', 'T', 'A', 'N'],
 ['A', 'C', 'T', 'A', 'N']]

转置矩阵，使列变为top -> bottom（而不是left -> right）。这样会将来自同一位置的所有字符放到一个列表中。

# [[matrix[j][i] for j in range(len(matrix))] for i in range(m)]
[['G', 'A', 'A', 'A'],
 ['A', 'A', 'A', 'C'],
 ['T', 'T', 'T', 'T'],
 ['C', 'C', 'A', 'A'],
 ['A', 'N', 'N', 'N']]

最后，遍历转置矩阵中的每个列表，并以子列表作为输入来调用most_common：

# for column in transposed_matrix:
#     print(most_common(column))
A
A
T
C
A

这种方法有一些警告；首先，如果单个位置中有相同数量的核苷酸，我包含的most_common函数将返回第一个值（请参见位置4，它可能是A或{{ 1}}）。此外，与使用集合中的C相比，most_common函数可能花费的时间指数更多。

由于这些原因，我强烈建议您使用以下脚本，因为安装时python随附了Counter。

collections

Answer 2

在我的情况下，您必须转到“收藏夹”模块：

C:\Python27\Lib\collections.py

获取所需的部分并将其复制到脚本中，以防您需要Counter类。

如果Counter类从该脚本或其他导入的模块中采购其他东西，则可能会变得复杂。您可以转到这些导入的模块，然后将代码复制到脚本中，但是它们也可能引用更多模块。

您不想在脚本中导入模块的原因是什么？也许比不导入任何东西有更好的解决方案。

没有模块的程序

2 个答案: