一个表空间和制表符分隔开,并且需要用分号分隔字段,已经直接用awk尝试过,但是没有用。如果一个我没有足够的东西来完成相同的工作,那么用一个perl脚本来处理带有ASCII样式的管道的表,这些表必须分开并加下划线。
string str;
bool over = false;
string pattern = "^[a-z]{1,500}@[a-z]{1,500}.com|((?=.*.co)(?=.*.il))$";
Regex Filter = new Regex(pattern);
while (!over)
{
Console.WriteLine("please enter a string and we will tell you if its good or not");
str=Console.ReadLine();
Console.WriteLine(Filter.IsMatch(str).ToString());
if (pattern == "over")
over = true;
}
已经尝试过
Name full CI FG AG DG Date (UTC) Virnia Ray
34842865 093161455 - - 2019-07-12T12:09:31.378Z Vitoxia Sureez
40151215 094063155 36.3 - 2019-07-14T13:18:11.733Z
删除所有空格
L。Scott制作的Perl脚本最初用于转换表ASCII样式的
sed -e 's/^[ t]*//' -e 's/ /\;/g'
期望:
while(<>) {
@vals = split / /; # split fields into the val array taking space separator
$size = @vals;
for( $i = 0 ; $i < $size ; $i++ )
{
#clean up the values: remove underscores and extra spaces in the fields and remove possible semicolons there
$vals[$i] =~ s/_/ /g;
$vals[$i] =~ s/;/ /g;
$vals[$i] =~ s/^ *//;
$vals[$i] =~ s/ *$//;
# append the value to the data record for this field
$data[$i] .= $vals[$i];
# special handling for first field: use spaces when joining
$data[$i] .= " " if ($i==0); #do not know if this is necessary to the new requirement as we have space in more than the first field.
}
if(/\R/) # Taking carriage return as the end of record
{
# clean up the first record; trim spaces
$data[0] =~ s/^ *//;
$data[0] =~ s/ *$//;
$data[3] =~ s/\..*//; # remove the point and decimal for the field four
# join the records with semicolons
$line = join (";", @data);
# collapse multiple spaces
$line =~ s/ +/ /g;
# print this line and start over
print "$line\n" unless ($line eq '');
@data = ();
} }
当前输出:
Name full;CI;FG;AG;DG;Date (UTC)
Virnia Ray;34842865;093161455;-;-;2019-07-12T12:09:31.378Z
Vitoxia Sureez;40151215;094063155;36;-;2019-07-14T13:18:11.733Z
我在某些情况下与第一个字段类似:
Name;full;;;;;;;;;;;;;;;;;;CI;;;;;;;FG;;;AG;DG;Date;(UTC)
Virnia;Ray;;;;;;;;;;;;;;;;;;;34842865;093161455;-;;;;-;;;;;2019-07-1T12:09:31.378Z
Vitoxia;Sureez;;;;;;;;;;;;;;;;;;40151215;094063155;36;;;-;;;;;2019-07-14T13:18:11.733Z
我只需要在第一个字段中输入逗号前的数据,之后的所有内容都将被删除。如您所见,该行中其余数据的“行”不在同一行中。
原始数据来自一个由html2text解析的HTML代码,原始代码为:
Mar▒a Xatia Mecrdiz
M▒ndrz, yrcr▒a
cdcsurtmz at ruy opdx
lxtrb mxs2axs rl tsactfg
re xorts tdz drfod t 33743642 095518568 41 - 2019-06-12T13:48:40.200Z
zude def rtexetggacvc
opyxo ae f▒xuda tcso
dxzdtctfgs ti x9mdfggfhh
sx 7dfgab, asvro oi sz op
dgeto jxgdmszdd.
也许这里有一些实用程序可以代替html2text来直接通过渲染工具以更好的形状完成这项工作。
这是包含更多记录的html表。
<b>Mon Jul 05 2019</b><hr><table style="border: 1px solid
#dddddd;border-collapse: collapse;text-align: left;"><tr><th style="padding: 8px;background-color: #cce6ff">Name Full</th><th style="padding: 8px;background-color: #cce6ff">FG</th><th style="padding: 8px;background-color: #cce6ff">CG</th><th style="padding: 8px;background-color: #cce6ff">AG</th><th style="padding: 8px;background-color: #cce6ff">MG</th><th style="padding: 8px;background-color: #cce6ff">Date (UTC)</th><tr><th style="padding: 8px;background-color: #dddddd">Mrída Xatia Mecrdiz Míndrz, yrcrría cdcsurtmz at ruy opdxlxtrb mxs2axs rl tsactfgre xorts tdz drfod t zude def rtexetggacvcopyxo ae féxuda tcsodxzdtctfgs ti x9mdfggfhhsx 7dfgab, asvro oi sz op
dgeto jxgdmszdd.</th><th style="padding: 8px;background-color: #dddddd">33743642</th><th style="padding: 8px;background-color: #dddddd">095518568</th><th style="padding: 8px;background-color: #dddddd">41</th><th style="padding: 8px;background-color: #dddddd">-</th><th style="padding: 8px;background-color: #dddddd">2019-05-12T13:48:40.200Z</th></tr><tr><th style="padding: 8px;">Cdlga foxa</th><th style="padding: 8px;">45285726</th><th style="padding: 8px;">092641968</th><th style="padding: 8px;">28</th><th style="padding: 8px;">-</th><th style="padding: 8px;">2019-06-11T13:50:52.091Z</th></tr></table>
答案 0 :(得分:1)
我仍然没有足够的代表来评论,所以我假设名称由名字和姓氏组成(上面写着全名),并且不是空白。
while (<>) {
#Removes new line
chomp;
#Sets delimiter
$delimiter = ";";
#clean up text
#removes _ and ;
s![_;]!!g;
#removes leading spaces
s!^ *!!;
#replace multiple whitespace (you mentioned it's delimited by space and tab) with ;
s/(\s)\1*/$delimiter/ge;
#Remove Delimiter from the Date (UTC). /i ignores the case
s/Date.+\(UTC\)/Date (UTC)/i;
#Removes delimiter in Full name
s/^([^;]+);([^;]+)/$1 $2/i;
#Removes decimal on Field 4
s/^([^;]+;[^;]+;[^;]+;)([^;.]+)\.?[^;]*/"$1$2"/e;
print "$_\n";
}
输出
Name full;CI;FG;AG;DG;Date (UTC)
Virnia Ray;34842865;093161455;-;-;2019-07-12T12:09:31.378Z
Vitoxia Sureez;40151215;094063155;36;-;2019-07-14T13:18:11.733Z
注释
在使用!
时,我仅在某些正则表达式中使用/
来修复语法突出显示