我收到了一份简单的收据。我需要能够阅读收据上购买的物品。样品收据如下。
Tim Hortons
Alwasy Fresh
1 Brek Wrap Combo /A ($0.76)
1 Bacon-wrap $3.79
1 Grilled $0.00
1 5 Pieces Bacon-wrap $0.00
1 Orange $1.40
1 Deposit $0.10
Subtotal: $55.84
GST: $0.29
Debit: $55.84
Take out
Thanks for stopping by!!
Tell us how we did
我想出了以下正则表达式字符串来查找项目。
\d(\s){1,10}(.)*\s{1,}\$\d\.[0-9]{2}
它在很大程度上起作用,但有一些不正确的行,如
4
GST: $0.29
有人能想出更好的模式。下面是一个链接,可以看到它的实际效果。
Dev Tools console showing my if statement is not detecting display:block when it is clearly there
答案 0 :(得分:1)
这是我的尝试:
^(\d+)\s+(.*)\s+\(?(\$.+)\)?$
Stub。请记住打开多线选项。组件:
^ - beginning of line
(\d+) - capture the quantity at the beginning of each line item
\s+ - one or more space
(.*) - capture the item description
\s+ - one or more space
\(? - optional open bracket `(` character
($.+) - capture anything including and after the dollar sign
\)? - optional close bracket `)` character
$ - end of line
答案 1 :(得分:0)
您可以使用
^(\d+)\s+(.*?)\s+\(?\$(\d+\.\d+)
请参阅regex demo
此正则表达式应与/m
修饰符一起使用,以匹配不同行上的数据。在JS中,还需要/g
修饰符。
<强>解释强>
^
- 开始行(\d+)
- 第1组捕获一个或多个数字\s+
- 一个或多个空格(.*?)
- 第2组捕获零个或多个任何字符,但换行符最近的\s+
- 一个或多个空格\(?
- 可选的(
(在第一行)\$
- 文字$
(\d+\.\d+)
- 第3组捕获一个或多个数字,后跟.
和一个或多个数字。JS演示:
var re = /^(\d+)\s+(.*?)\s+\(?\$(\d+\.\d+)/gm;
var str = ' Tim Hortons\n Alwasy Fresh\n\n1 Brek Wrap Combo /A ($0.76)\n1 Bacon-wrap $3.79\n1 Grilled $0.00\n1 5 Pieces Bacon-wrap $0.00\n1 Orange $1.40\n1 Deposit $0.10\nSubtotal: $55.84\nGST: $0.29\nDebit: $55.84\nTake out\n\n Thanks for stopping by!!\n Tell us how we did';
while ((m = re.exec(str)) !== null) {
document.body.innerHTML += "Pcs: <b>" + m[1] + "</b>, item: <b>" + m[2] + "</b>, paid: <b>" + m[3] + "</b><br/>";
}
答案 2 :(得分:0)
我发现这个原始正则表达式有很多问题:
\d(\s){1,10}(.)*\s{1,}\$\d\.[0-9]{2}
首先,括号分组和匹配,但是当您量化匹配时,只捕获最后一次迭代,因此像(.)*
这样的匹配只会存储最后一个字符;你想要(.*)
。因为它是greedy,所以它将是美元符号前面的空格之前的字符,因为你的数据总是一个空格。同样,您在开头用(\s){1,10}
量化一个组,它只捕获最后一个空白字符。在这种情况下,您不需要该组,因为\s
是单个空格字符,因此您只需使用\s{1,10}
。
这是正则表达式的piece-by-piece explanation。
以下正则表达式捕获数量($ 1),商品描述($ 2),价格是否为括号($ 3)和价格($ 4):
^\s*(\d+)\s+(.*\S)\s+(\(?)\$([0-9.]+)\)?\s*$
解释并与您的样本at regex101匹配。
分离并注释(假设支持/ x标志):
/ # begin regex
^\s* # start of line, ignore leading spaces if present
(\d+) # $1 = quantity
\s+ # spacing as a delimiter
(.*\S) # $2 = item: contains anything, must end in a non-space char
\s+ # spacing as a delimiter
(\(?) # $3 = negation, an optional open parenthesis
\$ # dollar sign
([0-9.]+) # $4 = price
\)?\s*$ # trailing characters: optional end-paren and space(s)
/x # end regex, multi-line regex flag
从命令行执行示例perl代码:
perl -ne '
my ($quantity, $item, $neg, $price)
= /^\s*(\d+)\s+(.*\S)\s+(\(?)\$([0-9.]+)\)?\s*$/;
if ($item) {
if ($neg) { $price *= -1; }
print "<$quantity><$item><$price>\n"
}' RECEIPT_FILE
(如果您希望将其作为perl脚本,请使用while(<>) { }
包装代码并完成。)
这会将变量$ quantity,$ item和$ price分配给收据上的明细行。我假设要减去带括号的项目(但我无法验证,因为总计是无意义的),所以$ neg注意到括号的存在,因此$ price可以被否定。
我将输出设置为使用尖括号(<
和>
)来指示每个变量存储的内容。
您给定的样品收据的输出因此是:
<1><Brek Wrap Combo /A><-0.76>
<1><Bacon-wrap><3.79>
<1><Grilled><0.00>
<1><5 Pieces Bacon-wrap><0.00>
<1><Orange><1.40>
<1><Deposit><0.10>
你没说出你想要匹配的东西。如果您不关心价格而且没有任何负值,那么如果您有负面的后视或\K
,则不需要匹配器:
grep -Po '^\s*[0-9].*\$\K[0-9.]+' RECEIPT_FILE
Grep的-P
标志调用libpcre(如果您使用的是旧系统或嵌入式系统,则可能无法使用)并且-o
仅显示匹配的文本。 \K
表示比赛的开始。如果要捕获\$
,请将\K
放在0.76
3.79
0.00
0.00
1.40
0.10
之后。 (另请参阅regex101 description and matches。)
该grep命令的输出:
awk
仅限价格 - 使用awk
没有很好的方法可以有效地处理这个正则表达式。如果你正在处理大量的内容,你会感受到伤害。这是使用awk '$1 / 1 > 0 && $NF ~ /\$/ { gsub(/[()]/, "", $0); print $NF; }' RECEIPT_FILE
的解决方案,应该明显更快。 (如果输入很小,差异就不会明显。)
awk '
# if the quantity is indeed a number and the last field has a dollar sign
$1 / 1 > 0 && $NF ~ /\$/ {
gsub(/[()]/, "", $NF); # remove all parentheses from the last field
print $NF; # print the contents of the last field
}' RECEIPT_FILE
带注释的评论版:
awk '
# if the quantity is indeed a number and the last field has a dollar sign
$1 / 1 > 0 && $NF ~ /\$/ {
neg = 1;
if ( $NF ~ /\(/ ) { # the last field has an open parenthesis
gsub(/[()]/, "", $NF); # remove all parentheses from the last field
neg = -1;
}
print $NF * neg; # print the last field, negated if parenthesized
}' RECEIPT_FILE
仅限价格 - 使用awk,支持负价
var gulp = require('gulp');
var clean = require('gulp-clean');
var concat = require('gulp-concat');
var uglify = require('gulp-uglify');
var filter = require('gulp-filter');
var mainBowerFiles = require('main-bower-files');
// var imagemin = require('gulp-imagemin');
// var pngquant = require('imagemin-pngquant');
var bases = {
app: 'app/',
dist: 'dist/',
};
var paths = {
scripts: ['ppt/scripts/**/*.js'],
styles: ['ppt/styles/**/*.css'],
html: ['ppt/views/**/*.html'],
assets: ['ppt/assets/**/*.png', 'ppt/assets/**/*.svg'],
extras: ['index.html', '404.html', 'robots.txt', 'favicon.ico'],
};
var gulp = require('gulp'),
mainBowerFiles = require('main-bower-files');
gulp.task('bower', function() {
// mainBowerFiles is used as a src for the task,
// usually you pipe stuff through a task
return gulp.src(mainBowerFiles())
// Then pipe it to wanted directory, I use
// dist/lib but it could be anything really
.pipe(gulp.dest('dist/lib'))
});
// Delete the dist directory
gulp.task('clean', function() {
return gulp.src(bases.dist).pipe(clean());
});
// Process scripts and concatenate them into one output file
gulp.task('scripts', ['clean'], function() {
gulp.src(paths.scripts, {
cwd: bases.app
}).pipe(uglify()).pipe(concat('app.min.js')).pipe(gulp.dest(bases.dist + 'scripts/'));
});
// Imagemin images and ouput them in dist
// gulp.task('imagemin', ['clean'], function() {
// gulp.src(paths.images, {
// cwd: bases.app
// }).pipe(imagemin()).pipe(gulp.dest(bases.dist + 'assets/'));
// });
// Copy all other files to dist directly
gulp.task('copy', ['clean'], function() {
// Copy html
gulp.src(paths.html, {
cwd: bases.app
}).pipe(gulp.dest(bases.dist + 'views'));
// Copy styles
gulp.src(paths.styles, {
cwd: bases.app
}).pipe(gulp.dest(bases.dist + 'styles'));
//Copy assets
gulp.src(paths.assets, {
cwd: bases.app
}).pipe(gulp.dest(bases.dist + 'assets'));
// Copy app scripts
gulp.src(paths.scripts, {
cwd: bases.app
}).pipe(gulp.dest(bases.dist + 'scripts'));
// Copy extra html5bp files
gulp.src(paths.extras, {
cwd: bases.app
}).pipe(gulp.dest(bases.dist));
});
// A development task to run anytime a file changes
gulp.task('watch', function() {
gulp.watch('app/**/*', ['scripts', 'copy']);
});
// Define the default task as a sequence of the above tasks
gulp.task('default', ['clean', 'scripts', 'copy']);