我有一个.csv文件,其中的数字根据da_DK
区域设置进行格式化(即使用逗号代替句点作为小数点分隔符等等),所以它看起来像这样:
"5000","0,00","5,25", ....
我想使用命令行应用程序一次性转换文件中的所有数字,因此输出为“C”(或POSIX)语言环境(即点/句点用作小数分隔符):< / p>
"5000","0.00","5.25", ....
...保持小数位(即“0,00”应转换为“0.00”,而不是“0”或“0”)并保持所有其他数据/格式不变。
我知道有numfmt,应该允许这样的内容:
$ LC_ALL=en_DK.utf8 numfmt --from=iec --grouping 22123,11
22.123,11
...但是,numfmt
只能在单位之间进行转换,而不能在区域设置之间进行转换(一旦指定了LC_ALL
,输入数字也必须符合它,就像输出一样)。
我最终喜欢与CSV无关的东西 - 也就是说,可以通过文本文件进行解析,找到与给定输入语言环境中的数字格式匹配的所有子字符串(即程序将从字符串中推断出来) "5000","0,00","5,25","hello"....
三个特定于语言环境的数字子字符串5000
,0,00
和5,25
),转换并替换这些子字符串,并保留其他所有内容;但作为替代方案,我还想了解一种支持CSV的方法(即,逐行解析所有字段,然后检查每个字段的内容是否与特定于语言环境的数字字符串匹配)。
答案 0 :(得分:0)
更新:这会将numbers.numbers转换为数字数字和数字,数字转换为数字。任何文本的数字:
for label in labels:
label.destroy()
(与OP&#39; s sed -e 's/\([0-9]\+\)\.\([0-9]\+\)/\1\2/g' -e 's/\([0-9]\+\),\([0-9]\+\)/\1.\2/g'
Orig string: "AO900-020","Hello","World","5000","0,00","5,25","stk","","1","0,00","Test 2","42.234,12","","","0,00","","","","5,25"
Conv string: "AO900-020","Hello","World","5000","0.00","5.25","stk","","1","0.00","Test 2","42234.12","","","0.00","","","","5.25"
回答相同的i / o示例)
注意:如果您的csv中有任何未加引号的字段,那将非常糟糕。
答案 1 :(得分:0)
好的,我确实找到了一种在Perl中做到这一点的方法,并且它并不完全是微不足道的;转换测试字符串的示例(csv-agnostic)脚本粘贴在下面。最终打印出来:
Orig string: "AO900-020","Hello","World","5000","0,00","5,25","stk","","1","0,00","Test 2","42.234,12","","","0,00","","","","5,25"
Conv string: "AO900-020","Hello","World","5000","0.00","5.25","stk","","1","0.00","Test 2","42234.12","","","0.00","","","","5.25"
......这基本上就是我想要实现的目标;但是这里可能存在边缘情况,这是不可取的。也许最好使用像csvfix
或csvtool
这样的工具来使用这样的东西,或者直接在代码中使用Perl csv库。
仍然是代码:
#!/usr/bin/env perl
use warnings;
use strict;
use locale;
use POSIX qw(setlocale locale_h LC_ALL);
use utf8;
use Number::Format qw(:subs); # sudo perl -MCPAN -e 'install Number::Format'
use Data::Dumper;
use Scalar::Util::Numeric qw(isint); # sudo perl -MCPAN -e 'install Scalar::Util::Numeric'
my $old_locale;
# query and save the old locale
$old_locale = setlocale(LC_ALL);
# list of (installed) locales: bash$ locale -a
setlocale(LC_ALL, "POSIX");
# localeconv() returns "a reference to a hash of locale-dependent info"
# dereference here:
#%posixlocalesettings = %{localeconv()};
#print Dumper(\%posixlocalesettings);
# or without dereference:
my $posixlocalesettings = localeconv();
# the $posixlocalesettings has only 'decimal_point' => '.';
# force also thousands_sep to '', else it will be comma later on, and grouping will be made regardless
$posixlocalesettings->{'thousands_sep'} = '';
print Dumper($posixlocalesettings);
#~ my $posixNumFormatter = new Number::Format %args;
# thankfully, Number::Format seems to accept as argument same kind of hash that localeconv() returns:
my $posixNumFormatter = new Number::Format(%{$posixlocalesettings});
print Dumper($posixNumFormatter);
setlocale(LC_ALL, "en_DK.utf8");
my $dklocalesettings = localeconv();
print Dumper($dklocalesettings);
# Get some of locale's numeric formatting parameters
my ($thousands_sep, $decimal_point, $grouping) =
# @{localeconv()}{'thousands_sep', 'decimal_point', 'grouping'};
@{$dklocalesettings}{'thousands_sep', 'decimal_point', 'grouping'};
# grouping and mon_grouping are packed lists
# of small integers (characters) telling the
# grouping (thousand_seps and mon_thousand_seps
# being the group dividers) of numbers and
# monetary quantities. The integers’ meanings:
# 255 means no more grouping, 0 means repeat
# the previous grouping, 1-254 means use that
# as the current grouping. Grouping goes from
# right to left (low to high digits). In the
# below we cheat slightly by never using anything
# else than the first grouping (whatever that is).
my @grouping = unpack("C*", $grouping);
print "en_DK.utf8: thousands_sep $thousands_sep; decimal_point $decimal_point; grouping " .join(", ", @grouping). "\n";
my $inputCSVString = '"AO900-020","Hello","World","5000","0,00","5,25","stk","","1","0,00","Test 2","42.234,12","","","0,00","","","","5,25"';
# Character set modifiers
# /d, /u , /a , and /l , available starting in 5.14, are called the character set modifiers;
# /l sets the character set to that of whatever Locale is in effect at the time of the execution of the pattern match.
while ($inputCSVString =~ m/[[:digit:]]+/gl) { # doesn't take locale in account
print "A Found '$&'. Next attempt at character " . (pos($inputCSVString)+1) . "\n";
}
print "----------\n";
#~ while ($inputCSVString =~ m/(\d{$grouping[0]}($|$thousands_sep))+/gl) {
#~ while ($inputCSVString =~ m/(\d)(\d{$grouping[0]}($|$thousands_sep))+/gl) {
# match a string that starts with digit, and contains only digits, thousands separators and decimal points
# note - it will NOT match negative numbers
while ($inputCSVString =~ m/\d[\d$thousands_sep$decimal_point]+/gl) {
my $numstrmatch = $&;
my $unnumstr = unformat_number($numstrmatch); # should unformat according to current locale ()
my $posixnumstr = $posixNumFormatter->format_number($unnumstr);
print "B Found '$numstrmatch' (unf: '$unnumstr', form: '$posixnumstr'). Next attempt at character " . (pos($inputCSVString)+1) . "\n";
}
sub convertNumStr{
my $numstrmatch = $_[0];
my $unnumstr = unformat_number($numstrmatch);
# if an integer, return as is so it doesn't change trailing zeroes, if the number is a label
if ( (isint $unnumstr) && ( $numstrmatch !~ m/$decimal_point_dk/) ) { return $numstrmatch; }
#~ print "--- $unnumstr\n";
# find the length of the string after the decimal point - the precision
my $precision_strlen = length( substr( $numstrmatch, index($numstrmatch, $decimal_point_dk)+1 ) );
# must manually spec precision and trailing zeroes here:
my $posixnumstr = $posixNumFormatter->format_number($unnumstr, $precision_strlen, 1);
return $posixnumstr;
}
# e modifier to evaluate perl Code
(my $replaceString = $inputCSVString) =~ s/(\d[\d$thousands_sep$decimal_point]+)/"".convertNumStr($1).""/gle;
print "Orig string: " . $inputCSVString . "\n";
print "Conv string: " . $replaceString . "\n";