多行正则表达式替换,如果超过两个

时间:2012-04-14 21:15:55

标签: regex perl sed awk multiline

我在以下方面遇到了困难;

我需要以特定格式在moodle(在线问题网站)中导入带有问题和答案的word文件。一切都是黑色接受正确的答案,这些是绿色的。起始格式如下:

1. Question example

a. Wrong

b. Wrong

C. Wrong

D. Right

输出应该变为

:Question example

:Question example

{

~ Wrong

~ Wrong

~ Wrong

= Right

}

我打开文件,用*替换所有红色段落标记(我不能用组替换)。之后,我将.docx文件导出为文本。 在我的linux计算机上打开并抛出以下正则表达式。

sed -i -e 's/^\r/\n/g' tmp #OS X white line replacement                    
sed -i -e 's/\r//g' tmp #remove white lines                           
sed -i -e 's:^[a-z]\.:~:' tmp #Replace Leading question letters with tilde                                                                                               
sed -i -e 's/\(^[0-9]*\.\ \)\(.*\)/}\n::\2\n::\2\n{/' tmp #regenerate tittle                    
sed -i -n '${p;q};N;/\n\*/{s/"\?\n//p;b};P;D' tmp #next line starts with * append to front of current                                                              
sed -i -e 's:^~\(.*\)\(\*.*\)$:=\1:' tmp #move * from back to = to front
sed -i -e 's:^\*:=:' tmp #replace any remaining * with =        
sed '/^$/d' tmp #delete any remaining white lines 

这不是很好,但效果很好,问题是手工制作的,并且有很多错误,所以我仍然需要手工操作。困难的部分是我有多个正确的答案。输出应该如下所示;

:Question example

:Question example

{

~%-100% Wrong

~%-100% Wrong

~%50% Right

~%50% Right

}

理想情况下,我有一个sed或perl正则表达式,它计算{之间的数量=并用〜%50%替换它们。并且所有〜都是%-100%。我也可以将此代码用于3个正确答案,其中每个正确答案变为〜%33%。

这可行吗?我有超过1000个问题,它肯定有助于自动化。使用sed进行多行替换是有点棘手的两条线,所以我猜四条或更多条线需要perl?我没有Perl的经验。

有人可以帮我解决这个问题吗?请原谅我的英语不好我是非母语人士。

4 个答案:

答案 0 :(得分:1)

my $file = do { local $/; <> };
my @questions = split /(?<=.)(?=[0-9]+\.)/s, $file;
for (@questions) {
   my @lines = split /^/m;

   my $title = shift(@lines);
   $title =~ s/^\S+\s*/:/;

   my $num_right;
   my $num_wrong;
   for (@lines) {
      if    (/Right/) { ++$num_right; }
      elsif (/Wrong/) { ++$num_wrong; }
   }

   my $num_answers = $num_right + $num_wrong;

   my $right_pct = sprintf('%.0f', $num_right/$num_answers*100);
   my $right_prefix = $num_right == 1 ? "=" : "~%$right_pct%";
   my $wrong_prefix = $num_right == 1 ? "~" : "~%-100%";

   for (@lines) {
      if    (/Right/) { s/^\S+/$right_prefix/; }
      elsif (/Wrong/) { s/^\S+/$wrong_prefix/; }
   }

   print(
      $title,
      "\n",
      $title,
      "\n{\n",
      @lines,
      "\n}\n",
   );
}

/Right//Wrong/替换为适当的内容。

答案 1 :(得分:1)

下面的程序根据我对你需要的最佳猜测而工作。它的工作原理是将所有信息读入数组,然后对其进行格式化。

目前,数据已合并到源中并从DATA文件句柄中读取。将循环更改为while (<>) { ... }将允许您在命令行上指定数据文件。

如果我猜错了,你必须纠正我。

use strict;
use warnings;

my @questions;

while (<DATA>) {
  next unless /\S/;
  s/\s+$//;
  if (/^\d+\.\s*(.+)/) {
    push @questions, [$1];
  }
  elsif (/^[A-Za-z]\.\s*(.+)/i) {
    push @{$questions[-1]}, $1;
  }
}

for my $question (@questions) {

  my ($text, @answers) = @$question;

  print "::$text\n" for 1, 2;

  my $correct = grep /right/i, @answers;
  my $percent = int(100/$correct);

  print "{\n";

  if ($correct == 1) {
    printf "%s %s\n", /right/i ? '=' : '~', $_ for @answers;
  }
  else {
    my $percent = int(100/$correct);
    printf "~%%%d%%~ %s\n", /right/i ? $percent : -100, $_ for @answers;
  }

  print "}\n";
}

__DATA__
1. Question one

a. Wrong

b. Wrong

c. Right

d. Wrong

2. Question two

a. Right

b. Wrong

c. Right

d. Wrong

3. Question three

a. Right

b. Right

c. Wrong

d. Right

<强>输出

::Question one
::Question one
{
~ Wrong
~ Wrong
= Right
~ Wrong
}
::Question two
::Question two
{
~%50%~ Right
~%-100%~ Wrong
~%50%~ Right
~%-100%~ Wrong
}
::Question three
::Question three
{
~%33%~ Right
~%33%~ Right
~%-100%~ Wrong
~%33%~ Right
}

答案 2 :(得分:1)

这可能对您有用:

cat <<\! >file.sed
> # On encountering a digit in the first character position
> /^[0-9]/{
>   # Create a label to cater for last line processing
>   :end
>   # Swap to hold space
>   x
>   # Check hold space for contents.
>   # If none delete it and begin a new cycle
>   # This is to cater for the first question line
>   /./!d
>   # Remove any carriage returns
>   s/\r//g
>   # Remove any blank lines
>   s/\n\n*/\n/g
>   # Double the question line, replacing the question number by a ':'
>   # Also append a { followed by a newline
>   s/^[0-9]*\.\([^\n]*\n\)/:\1:\1{\n/
>   # Coalesce lines beginning with a * and remove optional preceeding "
>   s/"\?\n\*/*/g
>   # Replace the wrong answers a,b,c...  with ~%-100%
>   s/\n[a-zA-z]*\. \(Wrong\)/\n~%-100% \1/g
>   # Replace the right answers a,B,c... with ~%100%
>   s/\n[a-zA-Z]*\. \(Right\)/\n~%100% \1/g
>   # Assuming no more than 4 answers:
>   # Replace 4 correct answers prefix with ~%25%
>   s/\(~%100%\)\(.*\)\1\(.*\)\1\(.*\)\1/~%25%\2~%25%\3~%25%\4~%25%/
>   # Replace 3 correct answers prefix with ~%33%
>   s/\(~%100%\)\(.*\)\1\(.*\)\1/~%33%\2~%33%\3~%33%/
>   # Replace 2 correct answers prefix with ~%50%
>   s/\(~%100%\)\(.*\)\1/~%50%\2~%50%/
>   # Append a newline and a }
>   s/$/\n}/
>   # Break and so print newly formatted string
>   b
>   }
> # Append pattern space to hold space
> H
> # On last line jump to end label
> $b end
> # Delete all lines from pattern space
> d
> !

然后运行:

sed -f file.sed file

答案 3 :(得分:0)

您的示例与此文档不符:http://docs.moodle.org/22/en/GIFT。问题标题和问题由两个冒号分隔,而不是一个冒号:

//Comment line 
::Question title 
:: Question {
=A correct answer
~Wrong answer1
#A response to wrong answer1
~Wrong answer2
#A response to wrong answer2
~Wrong answer3
#A response to wrong answer3
~Wrong answer4
#A response to wrong answer4
}

有些人天真地根据你的例子给你答案,而不是找到真正的规范,哎呀。

您的问题无法回答,因为您的格式未显示哪些是正确的答案。也就是说:

1. Question

a. Is this right?

b. Or this?

c. Or this?

您说这些是使用原始Word文档中的颜色进行识别的,并且您要对其进行一些替换以保留信息;但是,你没有展示这个例子!哎呀......