使用ruby从逗号分隔文件中提取电子邮件

时间:2015-06-20 15:58:42

标签: ruby-on-rails ruby email

所以我在txt文件中购买了一个电子邮件列表,当然还有一堆电子邮件地址,虽然它们与其他我不太关心的文本混合在一起。最后,我只想提取电子邮件地址并将其保存到新文件中。我如何使用Ruby实现这一目标?

我离开了,但我已经尝试过了:

VALID_EMAIL_REGEX = /\A([\w+\-].?)+@[a-z\d\-]+(\.[a-z]+)*\.[a-z]+\z/i

emails = "id,pwsid,pid,age,sex,domain,orderamount,first_order_amount,cobrand_id,show_lang,profile_type,handle,email
374380696,310579607_70200,g1067409-pct.subregmem,27,1,gmail.com,0,0,0,english,0,parineeti,rishav.kr2055@gmail.com
374380707,310579618_50472,g1067409-pct.subregmem,27,1,gmail.com,0,0,0,english,0,rajuhalchal,hopowertuls@gmail.com
374380708,310579619_86273,g1227112-pct.subposhgay,45,1,mail.com,0,0,21194,english,0,hertsmale2012,herstmale@mail.com
374380712,310579622_52452,p1911455.sub213,46,1,gmail.com,0,0,31384,english,0,anchezchris0360,Sanchezchris03@gmail.com"

emails_split = emails.split(/,/)

def keep_only_email(email)
  email =~ VALID_EMAIL_REGEX
end

keep_only_email(emails_split)

请帮忙,

干杯! AP

3 个答案:

答案 0 :(得分:2)

看起来这是一个CSV文件,您可以像这样解析它。

require 'csv'    

csv_text = File.read('input.csv')
csv = CSV.parse(csv_text, headers: true)
file = File.open("output.csv", "w")
csv.each do |row|
  file.write("#{row['email']}\n")
end

答案 1 :(得分:1)

这可以通过使用CSV来完成,require 'csv' CSV.open('output.csv', 'w', headers: ['email'], write_headers: true) do |csv| CSV.read('input.csv', headers: true).values_at('email').each do |row| csv << row end end 是ruby标准库的一部分。您基本上是在文件中读取,获取您正在寻找的列中的值并写入新的csv。

(function () {
  'use strict';

angular.module('app', []);

angular.module('app')
  .controller('openProjectsCtrl', openProjectsCtrl);

openProjectsCtrl.$inject = ['$scope', 'Poller'];

function openProjectsCtrl($scope, Poller) {
  $scope.data = Poller.data;
}

angular.module('app')
    .factory('Poller', Poller);

Poller.$inject = ['$http', '$timeout'];

function Poller($http, $timeout) {

var data = { response: [], calls: 0 };

var json = [{
  "id": 1,
  "Skills": "Marketing Assistant",
  "Budget": "$5.66",
  "Posted": "7/14/2014"
}, {
  "id": 2,
  "Skills": "Paralegal",
  "Budget": "$6.43",
  "Posted": "9/7/2014"
}, {
  "id": 3,
  "Skills": "Statistician II",
  "Budget": "$5.06",
  "Posted": "2/10/2015"
}, {
  "id": 4,
  "Skills": "Payment Adjustment Coordinator",
  "Budget": "$3.42",
  "Posted": "1/19/2015"
}];


var poller = function () {
  // $http.get('http://localhost/app/controllers/php/getProjects.php')
  // .then(function(r) {
    // data.response = r.data;
    // data.calls++;
    // $timeout(poller, 1000);
  // });
  angular.copy(json, data.response);
};

poller();

return {
  data: data.response
};
}
})();

答案 2 :(得分:0)

如果数据(行)总是采用相同的格式,我会使用类似的东西:

    VALID_EMAIL_REGEX = /\A([\w+\-].?)+@[a-z\d\-]+(\.[a-z]+)*\.[a-z]+\z/i

    lines = "id,pwsid,pid,age,sex,domain,orderamount,first_order_amount,cobrand_id,show_lang,profile_type,handle,email
    374380696,310579607_70200,g1067409-pct.subregmem,27,1,gmail.com,0,0,0,english,0,parineeti,rishav.kr2055@gmail.com
    374380707,310579618_50472,g1067409-pct.subregmem,27,1,gmail.com,0,0,0,english,0,rajuhalchal,hopowertuls@gmail.com
    374380708,310579619_86273,g1227112-pct.subposhgay,45,1,mail.com,0,0,21194,english,0,hertsmale2012,herstmale@mail.com
    374380712,310579622_52452,p1911455.sub213,46,1,gmail.com,0,0,31384,english,0,anchezchris0360,Sanchezchris03@gmail.com"

    emails = []
    lines.split("\n").each do |line|
      data = line.split(',')
      emails << data[12] if data[12].match(VALID_EMAIL_REGEX)
    end

电子邮件数组将包含所有电子邮件。