Question

我的文本文件数据如下所示:(蛋白质 - 蛋白质相互作用数据）

transcription_factor蛋白

Myc Rilpl1

Mycn Rilpl1

Mycn“Wdhd1，Socs4”

Sox2 Rilpl1

Sox2“Wdhd1，Socs4”

Nanog“Wdhd1，Socs4”

我希望它看起来像这样:(看每个蛋白质有多少个transcription_factor与之相互作用）

蛋白质转录因子

Rilpl1 Myc，Mycn，Sox2

Wdhd1 Mycn，Sox2，Nanog

Socs4 Mycn，Sox2，Nanog

使用我的代码之后，我得到的是:(如何摆脱“”并将两种蛋白质分离到新的一行）

蛋白质转录因子

Rilpl1 Myc，Mycn，Sox2

“Wdhd1，Socs4”Mycn，Nanog，Sox2

这是我的代码：

input_file = ARGV[0]
hash = {}
File.readlines(input_file, "\r").each do |line|
  transcription_factor, protein = line.chomp.split("\t")

  if hash.has_key? protein
    hash[protein] << transcription_factor
  else
    hash[protein] = [transcription_factor]
  end
end

hash.each do |key, value|
  if value.count > 2
    string = value.join(', ')
    puts "#{key}\t#{string}"
  end
end

Answer 1

以下是解决问题的快速方法：

...
transcription_factor, proteins = line.chomp.split("\t")
proteins.to_s.gsub(/"/,'').split(',').each do |protein|
  if hash.has_key? protein
    hash[protein] << transcription_factor
  else
    hash[protein] = [transcription_factor]
  end
end
...

如果有的话，上面的片段基本上会删除蛋白质中的引号，然后对于每种蛋白质，它会发现它已经写好了。

此外，如果你想消除你可以像这样定义哈希：

hash = Hash.new {|hash,key| hash[key]= []}

这意味着对于每个新的key，它将返回一个新数组。所以现在你可以跳过if并写下

hash[protein] << transcription_factor

需要帮助通过ruby输入文本文件到哈希比从哈希选择相同的值

1 个答案: