我有一个文件,其文本由<BD>
开始和<ED>
结束分隔符分隔,允许嵌套。我希望更改这些分隔符以唯一地指示它们之间的每个文本范围。这些分隔符可以是任意字符串。例如:
%{ # Begin delimiter <BD>
}% # End delimiter <ED>
我想用唯一编号的标记替换分隔符:
<BM><UniqueNumber><BM> # <BD> is replaced by <BM>i<BM>
<EM><UniqueNumber><EM> # <ED> is replaced by <EM>i<EM>
<BM>
和<EM>
是任意长度的字符串,可以是二进制的,并且不存在于正在处理的文件中。例如,在大多数文本文件中,可以$'\x01'
使用<BM>
$'\x02'
,<EM>
使用A %{ B
C %{ D
E }% F %{ G }% H }% I
J %{ K }% L
。
例如,文件包含分隔的文本跨度,包括嵌套跨度:
A <BM>0<BM> B
C <BM>1<BM> D
E <EM>1<EM> F <BM>2<BM> G <EM>2<EM> H <EM>0<EM> I
J <BM>3<BM> K <EM>3<EM> L
其中字母A..L可以是任何文本。转型产生:
<BM>i<BM>...<EM>i<EM>
注意:我不正在寻找编号以指示嵌套级别;我正在寻找每个匹配的ChangeMarkup()
{
local InputFile="$1"
local OutputFile="$2"
local BD="$3" # Begin delimiter
local ED="$4" # End delimiter
local BM="$5" # Begin unique numbered marker
local EM="$6" # End unique numbered marker
local -i N=0
# ... convert InputFile to OutputFile, incrementing N for each span
echo "$N" # Echo the number of spans
}
# Example invocation:
NSpans=$(ChangeMarkup infile outfile '%{' '}%' $'\x01' $'\x02')
文本范围,以唯一整数标记,从0开始向上计数。
而且,我希望能够存储为 0..N-1 标记生成的最大数量 N 。我在想Bash功能:
N=0
我认为,解决方案将是:
<BD>
N
并将<BD>
推入堆栈。将<BM>$N<BM>
替换为N
。增加<ED>
。<EM><pop stack><EM>
并替换为$N
public class MainActivity extends AppCompatActivity {
private List<Book> bookList = new ArrayList<>();
private RecyclerView recyclerView;
private BookAdapter adapter;
@Override
protected void onCreate(Bundle savedInstanceState) {
super.onCreate(savedInstanceState);
setContentView(R.layout.activity_main);
recyclerView = (RecyclerView) findViewById(R.id.recycler_view);
adapter = new BookAdapter(bookList);
RecyclerView.LayoutManager mLayoutManager = new LinearLayoutManager(getApplicationContext());
recyclerView.setLayoutManager(mLayoutManager);
recyclerView.setItemAnimator(new DefaultItemAnimator());
recyclerView.setAdapter(mAdapter);
FirebaseAuth auth = FirebaseAuth.getInstance();
FirebaseUser user = auth.getCurrentUser();
if (user != null) {
// User is still logged in
// Get UserInfo and instantiate DatabaseHandler
populateList();
} else {
// No user is logged in, go to auth activity
}
}
private void populateList() {
// Get Books from Firebase and add them to the adapter
Book book = new Book();
bookList.add(book);
// Notify the adapter, so that it updates the UI
adapter.notifyDataSetChanged();
}
}
我认为Bash脚本中的某些 awk 可能会解决问题。我认为这超出了 sed 的能力。我也对 python 或任何可以用Bash脚本编写的解决方案持开放态度,仅限于使用 CentOS 7 Minimal iso 中提供的包。不幸的是,这意味着无法考虑 perl 。
答案 0 :(得分:2)
如果,您可以使用gnu-awk
和RT special variable
awk -v BD='%{' -v ED='}%' -v BM='<BM>' -v EM='<EM>' '
BEGIN{i=c=-1; RS=BD"|"ED}
RT==BD {++i; ++c; d[i]=c; tag=BM}
RT==ED {tag=EM}
{printf "%s%s%s%s",$0,tag,d[i],tag}
RT==ED{--i; if(i==-1) tag=""}
' file
你明白了,
A <BM>0<BM> B
C <BM>1<BM> D
E <EM>1<EM> F <BM>2<BM> G <EM>2<EM> H <EM>0<EM> I
J <BM>3<BM> K <EM>3<EM> L
编辑:要求(2)
如果检测到不正确的嵌套,脚本可以返回错误代码?例如:%{A}%}%,第二个没有
awk -v BD='%{' -v ED='}%' -v BM='<BM>' -v EM='<EM>' '
BEGIN{i=c=-1; RS=BD"|"ED}
RT==BD {++i; ++c; d[i]=c; tag=BM}
RT==ED {tag=EM}
{
if(i<0 && tag!=""){
print "Error <ED> without opener" > "/dev/stderr"
exit 1
}
printf "%s%s%s%s",$0,tag,d[i],tag
}
RT==ED{--i; if(i==-1) tag=""}
END{
if(i!=-1){
print "Error <BD> without closer" > "/dev/stderr"
exit 1
}
}
' file
编辑:要求(1)
允许和逃脱?也就是说,如果这些分隔符前面有反斜杠,那么它们就不会被视为分隔符
和转义是\%{
和\}%
,例如
awk -v BD='%{' -v ED='}%' -v BM='<BM>' -v EM='<EM>' '
BEGIN{i=c=-1; RS="\\\\"BD"|\\\\"ED"|"BD"|"ED}
RT==BD {++i; ++c; d[i]=c; tag=BM}
RT==ED {tag=EM}
RT~/^\\/{printf "%s%s",$0,RT; next}
{
if(i<0 && tag!=""){
print "Error <ED> without opener" > "/dev/stderr"
exit 1
}
printf "%s%s%s%s",$0,tag,d[i],tag
}
RT==ED{--i; if(i==-1) tag=""}
END{
if(i!=-1){
print "Error <BD> without closer" > "/dev/stderr"
exit 1
}
}
' file
带输入文件
A %{ B
C %{ D
E }% F %{ G }% H }% I
J %{ K }% L\%{ M\}%O
你明白了,
A <BM>0<BM> B
C <BM>1<BM> D
E <EM>1<EM> F <BM>2<BM> G <EM>2<EM> H <EM>0<EM> I
J <BM>3<BM> K <EM>3<EM> L\%{ M\}%O