如何用一系列单词postion

时间:2017-05-18 13:48:17

标签: c# asp.net

我有这样一句话:

ジェーンは先週日本に来て、毎日4時間日本語のクラスで勉強しています

数据令牌如下:

  

[{"令牌":"ジェーン""类型":"字"" start_offset" :0," end_offset&#34:4,"位置":0},{"令牌":"は""类型":"字"" start_offset&#34:4," end_offset":5,"位置":1},{& #34;令牌":"先周""类型":"字"" start_offset":5,&#34 ; end_offset":7,"位置":2},{"令牌":"日本""类型":& #34;字"" start_offset":7," end_offset":9,"位置":3},{"令牌&#34 ;:"に""类型":"字"" start_offset":9," end_offset&#34 ;: 10,"位置&#34:4},{"令牌":"来""类型":"字&#34 ;," start_offset":10," end_offset":11,"位置":5},{"令牌":"て""类型":"字"" start_offset":11," end_offset":12,"位置":6},{"令牌":"毎日""类型":"字#34;" start_offset":13," end_offset":15,"位置":7},{"令牌":&# 34; 4""类型":"字"" start_offset":15," end_offset":16,&#34 ;位置":8},{"令牌":"时间""类型":"字"&#34 ; start_offset":16," end_offset":18,"位置":9},{"令牌":"&日本语#34 ;,"类型":"字"" start_offset":18," end_offset":21,"位置" :10},{"令牌":"の""类型":"字"" start_offset" :21," end_offset":22,"位置":11},{"令牌":"クラス""类型":"字"" start_offset":22," end_offset":25,"位置":12},{& #34;令牌":"で""类型":"字"" start_offset":25,&#34 ; end_offset":26,"位置":13},{"令牌":"勉强""类型":& #34;字"" start_offset":26,"&end_offset #34;:28,"位置":14},{"令牌":"し""类型":&#34 ;字"" start_offset":28," end_offset":29,"位置":15},{"令牌&#34 ;: "て""类型":"字"" start_offset":29," end_offset":30, "位置":16},{"令牌":"い""类型":"字&#34 ;, " start_offset":30," end_offset":31,"位置":17}]

如何通过start_offset和end_offset在句子中包装文本,如下所示:

<span>ジェーン</span><span>は</span><span>先週</span>... 

我已尝试使用StringBuilder替换位置,但单词的索引已更改,因此从令牌2开始,它是错误的。

2 个答案:

答案 0 :(得分:2)

插入新元素会移动元素之后的所有位置。因此,尝试从字符串的末尾开始并向后工作。这意味着您不必重新计算头寸,因为受影响的头寸是您已经处理过的头寸。

string result = sentence;

foreach (var token in dataTokens.OrderByDescending(x => x.position))
{
    result = result.Insert(token.end_offset, "</span>");
    result = result.Insert(token.start_offset, "<span>");
}

return result;

测试它会产生以下字符串:

 <span>ジェーン</span><span>は</span><span>先週</span><span>日本</span><span>に</span><span>来</span><span>て</span>、<span>毎日</span><span>4</span><span>時間</span><span>日本語</span><span>の</span><span>クラス</span><span>で</span><span>勉強</span><span>し</span><span>て</span><span>い</span>ます

答案 1 :(得分:0)

我建议创建一个新字符串然后执行类似这样的操作(不是确切的代码):

string s = null;
foreach(string token in dataTokens)
{
   s+="<span>" + token + "</span>";
}

UPD:评论后,我尝试用这样的代币模拟你的场景:

 class Token
{
    private int start_offset;
    private int end_offset;
    private int position;
    string type;
    string token;

    public Token(int so, int se, int pos, string type, string token)
    {
        start_offset = so;
        end_offset = se;
        position = pos;
        this.type = type;
        this.token = token;
    }

    public string TokenProp
    {
       get { return token; }
    }
}

我已经完成了这样的跨度滚动:

List<Token> tokens = new List<Token>();

        tokens.Add(new Token(0, 4, 0, "word", "abcd"));
        tokens.Add(new Token(4, 5, 1, "word", "e"));
        tokens.Add(new Token(6, 9, 2, "word", "fgh"));
        tokens.Add(new Token(9, 11, 3, "word", "ijk"));

        StringBuilder sb = new StringBuilder();
        foreach (Token t in tokens)
        {
            sb.Append("<span>");
            sb.Append(t.TokenProp);
            sb.Append("</span>");
        }

UPD2 以上答案更好:)