Question

我有一个文本文件，其中包含许多这样的1000行，这些类别描述的关键字括在括号中

Chemicals (chem) 
Electrical (elec)

我需要将这些行转换为逗号分隔值，如下所示：

Chemicals, chem
Electrical, elec

我正在使用的是：

lines = line.gsub!('(', ',').gsub!(')', '').split(',')

我想知道是否有更好的方法来做到这一点。

对于后代，这是完整的代码（基于答案）

require 'rubygems'
require 'csv'

csvfile = CSV.open('output.csv', 'w')
File.open('c:/categories.txt') do |f|
  f.readlines.each do |line|
    (desc, cat) = line.split('(')
    desc.strip!
    cat.strip!
    csvfile << [desc, cat[0,cat.length-1]]
  end
end

Answer 1

尝试这样的事情：

line.sub!(/ \((\w+)\)$/, ', \1')

\1将替换为给定正则表达式的第一个匹配项（在这种情况下，它将始终是category关键字）。因此它基本上会将(chem)更改为, chem。

让我们使用文本文件创建一个示例：

lines = []
File.open('categories.txt', 'r') do |file|
  while line = file.gets 
    lines << line.sub(/ \((\w+)\)$/, ', \1')
  end
end

根据问题更新，我可以提出这个问题：

require 'csv'

csv_file = CSV.open('output.csv', 'w')

File.open('c:/categories.txt') do |f| 
  f.each_line {|c| csv_file << c.scan(/^(.+) \((\w+)\)$/)}
end

csv_file.close

Answer 2

从Ruby 1.9开始，您可以在一个方法调用中执行此操作：

str = "Chemicals (chem)\n"
mapping = { ' (' => ', ',
            ')'  => ''}

str.gsub(/ \(|\)/, mapping)  #=> "Chemicals, chem\n"

Answer 3

在Ruby中，更简洁，更有效的方法是：

description, tag = line.split(' ', 2) # split(' ', 2) will return an 2 element array of
                                      # the all characters up to the first space and all characters after. We can then use
                                      # multi assignment syntax to assign each array element in a different local variable
tag = tag[1, (tag.length - 1) - 1] # extract the inside characters (not first or last) of the string
new_line = description << ", " << tag # rejoin the parts into a new string

这将在计算上更快（如果你有很多行）因为它使用直接字符串操作而不是正则表达式。

Answer 4

无需操纵字符串。只需抓取数据并将其输出到CSV文件即可。假设你在数据中有这样的东西：

化学品（化学）

Electrical（elec）

染料＆amp;中间体（染料）

这应该有效：

File.open('categories.txt', 'r') do |file|
  file.each_line do |line|
    csvfile << line.match(/^(.+)\s\((.+)\)$/) { |m| [m[1], m[2]] }
  end
end

Answer 5

与@ hundredwatt答案中的讨论相关的基准：

require 'benchmark'

line = "Chemicals (chem)"

# @hundredwatt
puts Benchmark.measure {
  100000.times do
    description, tag = line.split(' ', 2)
    tag = tag[1, (tag.length - 1) - 1]
    new_line = description << ", " << tag
  end
} # => 0.18

# NeX
puts Benchmark.measure {
  100000.times do
    line.sub!(/ \((\w+)\)$/, ', \1')
  end
} # => 0.08

# steenslag
mapping = { ' (' => ', ',
  ')'  => ''}
puts Benchmark.measure {
  100000.times do
    line.gsub(/ \(|\)/, mapping)
  end
} # => 0.08

Answer 6

对ruby一无所知，但在php中很容易

 preg_match_all('~(.+)\((.+)\)~','Chemicals (chem)',$m);

$result = $m[1].','.$m[2];

将“描述（标记）”解析为“描述，标记”的更好方法

6 个答案: