我有一些文本文件包含多行像
一样2011/01/01 13:13:13,<AB>, Some Certain Text,=,
[
certain text
[
0: 0 0 0 0 0 0 0 0
8: 0 0 0 0 0 0 0 0
16: 0 0 0 9 343 3938 9433 8756
24: 6270 4472 3182 2503 1768 1140 836 496
32: 326 273 349 269 144 121 94 82
40: 64 80 66 59 56 47 50 46
48: 64 35 42 53 42 40 41 34
56: 35 41 39 39 47 30 30 39
Total count: 12345
]
certain text
]
some text
2011/01/01 14:14:14,<AB>, Some Certain Text,=,
[
certain text
[
0: 0 0 0 0 0 0 0 0
8: 0 0 0 0 0 0 0 0
16: 0 0 0 4 212 3079 8890 8941
24: 6177 4359 3625 2420 1639 974 594 438
32: 323 286 318 296 206 132 96 85
40: 65 73 62 53 47 55 49 52
48: 29 44 44 41 43 36 50 36
56: 40 30 29 40 35 30 25 31
64: 47 31 25 29 24 30 35 31
72: 28 31 17 37 35 30 20 33
80: 28 20 37 25 21 23 25 36
88: 27 35 22 23 15 24 34 28
Total count: 123456
]
certain text
some text
]
文本之间存在那些变长块。我想在以后读出所有数字:并将它们保存在各个数组中。 在这种情况下,将有两个数组:
array1 = {0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 9 343 3938 9433 8756 6270 4472 3182 2503 1768 1140 836 496 326 273 349 269 144 121 94 82 64 80 66 59 56 47 50 46 64 35 42 53 42 40 41 34 35 41 39 39 47 30 30 39 12345}
array2 = {0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4 212 3079 8890 8941 6177 4359 3625 2420 1639 974 594 438 323 286 318 296 206 132 96 85 65 73 62 53 47 55 49 52 29 44 44 41 43 36 50 36 40 30 29 40 35 30 25 31 47 31 25 29 24 30 35 31 28 31 17 37 35 30 20 33 28 20 37 25 21 23 25 36 27 35 22 23 15 24 34 28 123456}
我发现lpeg可能是一种轻量级的方法来实现它。但我对PEGs和LPeg完全不熟悉。请帮忙!
答案 0 :(得分:5)
LPEG版本:
local lpeg = require "lpeg"
local lpegmatch = lpeg.match
local C, Ct, P, R, S = lpeg.C, lpeg.Ct, lpeg.P, lpeg.R, lpeg.S
local Cg = lpeg.Cg
local data_to_arrays
do
local colon = P":"
local lbrak = P"["
local rbrak = P"]"
local digits = R"09"^1
local eol = P"\n\r" + P"\r\n" + P"\n" + P"\r"
local ws = S" \t\v"
local optws = ws^0
local getnum = C(digits) / tonumber * optws
local start = lbrak * optws * eol
local stop = optws * rbrak
local line = optws * digits * colon * optws
* getnum * getnum * getnum * getnum
* getnum * getnum * getnum * getnum
* eol
local count = optws * P"Total count:" * optws * getnum * eol
local inner = Ct(line^1 * count^-1)
--local inner = Ct(line^1 * Cg(count, "count")^-1)
local array = start * inner * stop
local extract = Ct((array + 1)^0)
data_to_arrays = function (data)
return lpegmatch (extract, data)
end
end
这实际上只有在正好有八个整数时才有效
数据块的每一行。
根据您输入的形成情况,这可能是一个诅咒或一个
祝福;-)
测试文件:
data = [[
some text
[
some text
[
0: 0 0 0 0 0 0 0 0
8: 0 0 0 0 0 0 0 0
16: 0 0 0 9 343 3938 9433 8756
24: 6270 4472 3182 2503 1768 1140 836 496
32: 326 273 349 269 144 121 94 82
40: 64 80 66 59 56 47 50 46
48: 64 35 42 53 42 40 41 34
56: 35 41 39 39 47 30 30 39
Total count: 12345
]
some text
]
some text
[
some text
[
0: 0 0 0 0 0 0 0 0
8: 0 0 0 0 0 0 0 0
16: 0 0 0 4 212 3079 8890 8941
24: 6177 4359 3625 2420 1639 974 594 438
32: 323 286 318 296 206 132 96 85
40: 65 73 62 53 47 55 49 52
48: 29 44 44 41 43 36 50 36
56: 40 30 29 40 35 30 25 31
64: 47 31 25 29 24 30 35 31
72: 28 31 17 37 35 30 20 33
80: 28 20 37 25 21 23 25 36
88: 27 35 22 23 15 24 34 28
]
some text
some text
]
]]
local arrays = data_to_arrays (data)
for n = 1, #arrays do
local ar = arrays[n]
local size = #ar
io.write (string.format ("[%d] = { --[[size: %d items]]\n ", n, size))
for i = 1, size do
io.write (string.format ("%d,%s", ar[i], (i % 5 == 0) and "\n " or " "))
end
if ar.count ~= nil then
io.write (string.format ("\n [\"count\"] = %d,", ar.count))
end
io.write (string.format ("\n}\n"))
end
答案 1 :(得分:3)
我的纯Lua字符串库解决方案将是这样的:
local bracket_pattern = "%b[]" --pattern for getting into brackets
local number_pattern = "(%d+)%s+" --pattern for parsing numbers
local output_array = {} --output 2-dimensional array
local i = 1
local j = 1
local tmp_number
local tmp_sub_str
for tmp_sub_str in file_content:gmatch(bracket_pattern) do --iterating through [string]
table.insert(output_array, i, {}) --adding new [string] group
for tmp_number in tmp_sub_str:gmatch(number_pattern) do --iterating through numberWHITESPACE
table.insert(output_array[i], tonumber(tmp_number)) --adding [string] group element (number)
end
i = i + 1
end
编辑:这也适用于上传的文件格式。
答案 2 :(得分:3)
尝试使用此代码,不使用LPEG:
-- assume T contains the text
local a={}
local i=0
for b in T:gmatch("%b[]") do
b=b:gsub("%d+:","")
i=i+1
local t={}
local j=0
for n in b:gmatch("%d+") do
j=j+1; t[j]=tonumber(n)
end
a[i]=t
end
答案 3 :(得分:2)
phg已经为你的问题提供了一个很好的LPeg解决方案,但这是使用LPeg的re模块的另一个解决方案。语法更接近BNF,使用的运算符更像'正则表达式',所以这个解决方案可能更容易理解。
re = require 're'
function dump(t)
io.write '{'
for _, v in ipairs(t) do
io.write(v, ',')
end
io.write '}\n'
end
local textformat = [[
data_in <- block+
block <- text '[' block_content ']'
block_content <- {| data_arr |} / (block / text)*
data_arr <- (text ':' nums whitesp)+
text <- whitesp [%w' ']+ whitesp
nums <- (' '+ {digits} -> tonumber)+
digits <- %d+
whitesp <- %s*
]]
local parser = re.compile(textformat, {tonumber = tonumber})
local arr1, arr2 = parser:match(data)
dump(arr1)
dump(arr2)
每个数据块块都被捕获到一个单独的表中,并作为match
之一的输出返回。
如果data
与上面的输入相同,则匹配并捕获两个块,因此返回2个表。检查这两个表给出了:
{0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,9,343,3938,9433,8756, 6270,4472,3182,2503, 1768,1140,836,496,326,273,349,269,144,121,94,82,64,80,66,59,56,47,50,46,64,35,42 ,53,42,40,41,34,35,41,39,39,47,30,30,39,12345,} {0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,4,212,3079,8890,8941,6177,4359 ,3625,2420, 1639,974,594,438,323,286,318,296,206,132,96,85,65,73,62,53,47,55,49,52,29,44,44, 41,43,36,50,36,40,30,29,40,35,30,25,31,47,31,25,29,24,30,35,31,28,31,17,37, 35,30 ,20,33,28,20,37,25,21,23,25,36,27,35,22,23,15,24,34,28,}
答案 4 :(得分:1)
我知道这是一个迟到的回复但定义了更少的语法,以下模式找到了开放[
并捕获了:
后缀的每个数字,直到达到结束]
。然后重复整个block
直到没有匹配。
local patt = re.compile([=[
data <- {| block |}+
block <- ('[' ((%d+ ':') / { %d+ } -> int / [^]%d]+)+ ']') / ([^[]+ block)
]=], { int = tonumber })
您可以在表格中一次性捕获所有已恢复的数组
local a = { patt:match[=[ ... ]=] }