Question

我读过大胆和斜体字可以分别用** bold_text **和* italic_text *用降价语言表示。要同时使用粗体和斜体文本，您可以将文本用4个用于粗体的星号和2个用于斜体的下划线包装（反之亦然）。

我想写一个bash脚本来确定粗体字和斜体字的数量。我想这可以归结为计算双星号，单个星号，双下划线和单个下划线的数量。我的问题是如何从文件中计算特定字符串的数量，如“**”或“__”，所以我可以知道有多少粗体和斜体字。

#!/bin/bash

if [ -z "$1" ]; then
    echo "No input file specified."
else 
    ls $1 > /dev/null 2> /dev/null && 
    echo $(cat $1 | grep -o '\<**>\' | wc -c) || echo "File $1 does not exist."
fi

示例输入文件：

**This is bold and _italic_** text.

预期产出：

Bold words: 5
Italic words: 1
Bold and italic words: 1

Answer 1

简单方法

一些假设：

粗体使用__，斜体使用*（即使它也可能是**和_）
No＆＃34;有趣的东西＆＃34;比如包含这些字符的（内联）代码，或转发_或*，或带有导致*的列表来计算我们的数量

现在，要计算粗体字，我们可以使用

grep -Po '__.*?__' infile.md | grep -o '[^[:space:]]\+' | wc -l

这会在两对__之间查找任何内容。我使用Perl正则表达式引擎（-P）来启用非贪婪匹配（.*?）;否则，__bold__ not bold __bold__之类的东西只会是一个匹配。 -o只返回匹配项。

第二个grep匹配单词：一个或多个非空格字符的任何序列;并且wc -l计算输出线。

斜体相同的作品：

grep -Po '\*.*?\*' infile.md | grep -o '[^[:space:]]\+' | wc -l

要组合这些（对于粗体和斜体），必须组合命令列表。对于粗体内的斜体：

grep -Po '__.*?__' infile.md | grep -Po '\*.*?\*' | grep -o '[^[:space:]]\+' | wc -l

并在斜体内加粗：

grep -Po '\*.*?\*' infile.md | grep -Po '__.*?__' | grep -o '[^[:space:]]\+' | wc -l

清理更实际的文件

现在，真正的降价文件可能会有一些额外的惊喜（参见＆＃34;假设＆＃34;）：

* List item with **bold word**

Line with **bold words and \* an escaped asterisk**

Here is an *italicized* word

And *italics with a **bold** word inside*

And **bold words with *italics* inside**

    Code can have tons of *, ** and _ and we want to ignore them all

Also `inline code can have * and ** and _ to be ignored`, right?

将呈现为

使用粗体字
列出项目

与粗体字和*转义星号
对齐
这是斜体字

斜体，内部带粗体字

粗体字斜体
Code can have tons of *, ** and _ and we want to ignore them all
另外inline code can have * and ** and _ to be ignored，对吧？

清理这样的东西的一种方法是sed脚本：

/^$/d                           # Delete empty lines
/^    /d                        # Delete code lines (start with four spaces)
s/`[^`]*`//g                    # Remove inline code
/^\* /s/^\* (.*)/\1/            # Remove asterisk from list items
s/\\\*//g                       # Remove escaped asterisks
s/\\_//g                        # Remove escaped underscores
s/`[^`]*`//g                    # Remove inline code
s/\*\*/__/g                     # Make sure bold uses underscores
s/(^|[^_])_([^_]|$)/\1\*\2/g    # Make sure italics use asterisks

具有以下结果：

$ sed -rf md.sed infile.md
List item with __bold word__
Line with __bold words and  an escaped asterisk__
Here is an *italicized* word
And *italics with a __bold__ word inside*
And __bold words with *italics* inside__
Also , right?

准备好通过第一部分的命令消费。

全部放在一起

将脚本文件名作为参数的脚本中的所有内容：

#!/bin/bash

fname="$1"
tempfile="$(mktemp)"

sed -r '
    /^$/d
    /^    /d
    s/`[^`]*`//g
    /^\* /s/^\* (.*)/\1/
    s/\\\*//g
    s/\\_//g
    s/`[^`]*`//g
    s/\*\*/__/g
    s/(^|[^_])_([^_]|$)/\1\*\2/g
' "$fname" > "$tempfile"

bold=$(grep -Po '__.*?__' "$tempfile" | grep -o '[^[:space:]]\+' | wc -l)
italic=$(grep -Po '\*.*?\*' "$tempfile" | grep -o '[^[:space:]]\+' | wc -l)
both=$((
    $(grep -Po '__.*?__' "$tempfile" |
        grep -Po '\*.*?\*' | grep -o '[^[:space:]]\+' | wc -l)
    +
    $(grep -Po '\*.*?\*' "$tempfile" |
        grep -Po '__.*?__' | grep -o '[^[:space:]]\+' | wc -l)
))

rm -f "$tempfile"

echo "Bold words: $bold"
echo "Italic words: $italic"
echo "Bold and italic words: $both"

可以这样使用：

$ ./wordcount infile.md
Bold words: 14
Italic words: 8
Bold and italic words: 2

缺点

这可以通过包含下划线的单词来绊倒。一些降价口味忽略了这些并假设它们是这个词的一部分。
我确定我在清理中错过了一些边缘案例

Answer 2

我的解决方案是将**改为另一件事以使问题更容易我选择了〜，你可以把它换成别的东西

$ cat test
**bold**
*italic*
**bold**

sed 's/\*\*/~/g' test
~bold~
*italic*
~bold~

现在对于粗体，你应该计算〜的数量，最后除以2 算数〜

$ cat test | tr -d -c '~'
~~~~
$ cat test | tr -d -c '~' | wc -c
4

现在除以2，首先将输出保存在变量中。

$ bold=`cat test | tr -d -c '~' | wc -c`
$ expr $bold / 2
2

为斜体做类似的事情。

如何计算markdown语法文件中粗体字和斜体字的数量

2 个答案:

简单方法

清理更实际的文件

全部放在一起

缺点