Question

我正在开发一个程序，该程序读取包含有关手机的不同类型信息的文件，然后提取并存储每条信息。例如，下面是文件中的两行：

12hrs,Smartphone,2015WB0126A,used,Apple,2000$,{Bluetooth,Water resistant,fingerprint reader,16GB},white,2016
Used,Smartwatch,Samsung,{activity tracker,Bluetooth,water resistant},2017,250$,black,3947t4f,9hrs

在上面的第一行中，我要提取2016作为年份而不是2015和2000。我想提取2015WB0126A作为模型（字母和数字的任意组合），而不要提取12hrs和2000$。有人可以帮我弄这个吗？非常感谢。

f = File.open("listings.txt", "r")
f.each_line do |line|
  puts line
  year=line[/20+[0-9]+[0-9]/]
  puts "made in #{year}"
end

对于示例中的第一行，我希望年份等于2016，模型应该为2015WB0126A。

Answer 1

要处理此问题，我们必须先定义模式。

从您提供的2行中，我们知道这些是手机的信息。那么我们可以假设：

以'，'分隔的字段
今年是本世纪，20xx是一个很好的假设

def extract(str)
  fields = str.split(",")
  year = fields.find { |f| f.match /^20\d\d$/}
  model = fields.find do |f|
    f.match /\d/ and f.match /[a-zA-Z]/ and !f.match /\d+(hrs|hr|hour|hours|gb)/i
  end
  return year, model
end

在代码中，我假设模型包含数字和字母。我也排除了小时数和大小（gb）。我们还可以建立单词列表。因为该信息是关于手机的，所以我认为列表不长。

Answer 2

f.each_line do |line|
  # find 20xx proceeded by line start or a comma,
  # and followed line end or a comma.
  # ?: makes the group non-capturing
  year = line.match(/(?:^|,)(20\d{2})(?:$|,)/)
  year = year[1] if year

  model = line.split(',').select do |s|
    # 7-30 word characters in length
    s =~ /^\w{7,30}$/ &&
    # at least 5 digits anywhere in the word
    s =~ /(\d.*){5}/
  end

  puts "#{model.first} made in #{year}"
end

希望该模型有一些合理的限定符可以处理您的其余数据，因为它们很幼稚。

https://regex101.com/可以对任何正则表达式进行详细说明，如果您需要更多有关它们如何工作的详细信息。您还可以使用https://rubular.com/来测试ruby的正则表达式的确切风格。

如何从文件的每一行中提取整数（一年）以及字母和数字的组合

2 个答案: