如何在Elixir或Erlang中执行字符串replace_at
?
例如,给定此固定宽度文件:
EmployeeFundMappingID EmployeeID FundID IsActive EntryDate ExitDate ExitTypeID DateCreated CreatedByID DateModified ModifiedByID ConfirmedBy DateConfirmed GUID IsPooled DatePooled
1 1118544 1 1 2009-04-20 00:00:00.000 NULL NULL 2014-05-17 08:46:48.020 1 2014-10-30 13:34:47.177 NULL 1 2009-04-20 17:48:12.067 NULL NULL NULL
2 1027350 1 1 2008-03-03 00:00:00.000 NULL NULL 2014-05-17 08:46:48.020 1 2014-10-30 13:34:47.177 NULL 1 2008-05-04 15:13:30.303 NULL NULL NULL
3 1024795 1 1 2008-02-29 00:00:00.000 NULL NULL 2014-05-17 08:46:48.020 1 2014-10-30 13:34:47.177 NULL 1 2008-05-04 15:13:30.303 NULL NULL NULL
4 1116497 1 1 2009-03-24 00:00:00.000 NULL NULL 2014-05-17 08:46:48.020 1 2014-10-30 13:34:47.177 NULL 1 2009-03-24 13:00:15.277 NULL NULL NULL
5 1116569 1 1 2009-03-24 00:00:00.000 NULL NULL 2014-05-17 08:46:48.020 1 2014-10-30 13:34:47.177 NULL 1 2009-03-24 14:43:08.280 NULL NULL NULL
6 1116920 1 1 2009-03-27 00:00:00.000 NULL NULL 2014-05-17 08:46:48.020 1 2014-10-30 13:34:47.177 NULL 1 2009-03-27 17:16:35.073 NULL NULL NULL
col位置:
[0, 22, 34, 46, 55, 79, 103, 115, 139, 151, 175, 188, 200, 224, 265, 274]
我们如何在每个col位置用\s
替换\t
?
我正在尝试将Fixed-Width
文件转换为csv
答案 0 :(得分:3)
我会使用一组函数来减少原始行,从而改变字符串中的各个位置。
funs =
[22, 34, 46, 55, 79, 103, 115, 139, 151, 175, 188, 200, 224, 265, 274]
|> Enum.map(& &1 - 1)
|> Enum.map(fn len ->
fn <<s :: binary-size(len), " ", rest :: binary>> ->
s <> "\t" <> rest
end
end)
input
|> String.trim
|> String.split("\n")
|> Enum.map(fn line ->
Enum.reduce(funs, line, fn fun, acc -> fun.(acc) end)
end)
这可能是使用生成的宏,每个位置一次和递归调用以更优雅的方式完成的,但是在函数列表中减少对我来说更直接。
这种方法的优势在于它可以立即对任何不一致的数据进行失败,确保(或多或少)如果通过,转换工作正确完成,与其他所有更短的解决方案不同。
它也比任何Regex
解决方案都快得多。
由于这将应用于16M行,这里可能是性能最高的版本,它一次匹配整行:
input
|> String.trim
|> String.split("\n")
|> Enum.map(
# [22, 34, 46, 55, 79, 103,
# 115, 139, 151, 175, 188,
# 200, 224, 265, 274]
# note: this assumes the listed positions above are 1-based
fn <<
c1 :: binary-size(21),
" ",
c2 :: binary-size(11),
" ",
c3 :: binary-size(11),
" ",
c4 :: binary-size(8),
" ",
c5 :: binary-size(23),
" ",
c6 :: binary-size(23),
" ",
c7 :: binary-size(11),
" ",
c8 :: binary-size(23),
" ",
c9 :: binary-size(11),
" ",
c10 :: binary-size(23),
" ",
c11 :: binary-size(12),
" ",
c12 :: binary-size(11),
" ",
c13 :: binary-size(23),
" ",
c14 :: binary-size(40),
" ",
c15 :: binary-size(8),
" ",
c16 :: binary
>> ->
c1 <> "\t" <>
c2 <> "\t" <>
c3 <> "\t" <>
c4 <> "\t" <>
c5 <> "\t" <>
c6 <> "\t" <>
c7 <> "\t" <>
c8 <> "\t" <>
c9 <> "\t" <>
c10 <> "\t" <>
c11 <> "\t" <>
c12 <> "\t" <>
c13 <> "\t" <>
c14 <> "\t" <>
c15 <> "\t" <>
c16
end)
答案 1 :(得分:1)
您可以做的是先加入日期时间,然后用逗号替换所有空格,然后将日期时间恢复为原始格式:
$this->sendemail($request); // $this just because it is on same controller or basecontroller
// For example
public function saveInvoice(Request $request){
.
.
.
$sendMail = $this->sendemail($request);
if($sendMail == 'success'){
// On success
} else {
// On Error
}
}
public function sendemail(Request $request) {
.
.
.
return 'success';
}
答案 2 :(得分:0)
比较超过16M行的数据集上的两个实现:
def flat2csv1(src, dst) do
Logger.info("START")
t = System.system_time(:millisecond)
funs =
[12, 52, 76]
|> Enum.map(&(&1 - 1))
|> Enum.map(fn len ->
fn <<s::binary-size(len), " ", rest::binary>> ->
s <> "\t" <> rest
end
end)
File.stream!(src)
|> Enum.map(fn line ->
Enum.reduce(funs, line, fn fun, acc -> fun.(acc) end)
end)
|> write(dst)
log_elapsed("DONE", t)
end
def flat2csv0(src, dst) do
Logger.info("START")
t = System.system_time(:millisecond)
File.stream!(src)
|> Enum.map(fn <<
c1::binary-size(11),
" ",
c2::binary-size(39),
" ",
c3::binary-size(23),
" ",
ce::binary
>> ->
c1 <> "\t" <> c2 <> "\t" <> c3 <> "\t" <> ce
end)
|> write(dst)
log_elapsed("DONE", t)
end
defp log_elapsed(s, t) do
t = System.system_time(:millisecond) - t
Logger.debug("#{s}: #{t} ms")
end
defp write(s, dst) do
File.write!(dst, s, [:append])
end
结果
# flat2csv0
11:40:25.055 [info] START
11:42:26.028 [info] DONE: 120969 ms
# flat2csv1
11:45:17.521 [info] START
11:48:25.433 [info] DONE: 187906 ms