R-gsub从字符串中删除标点符号和数字

时间:2019-02-16 02:59:33

标签: r string

我正在尝试从<body> <div class="jumbotron text-center"> <h1>Lorem Ipsum</h1> </div> <div class="row"> <div class="container col-lg-7"> <div class="col-md-12 pull-right"> <h1>Summary: Lorem Ipsum<hr/></h1> <img data-src="holder.js/200x200" class="img-responsive shadow float-left col-md-4" alt="200x200" src="Lorem Ipsum.png" data-holder-rendered="true"> <p class="text-left lead">Lorem Ipsum<br><br> Lorem Ipsum</p> </div> </div> <div class="container col-lg-3"> <div class="col-md-12"> <h1 class="text-center">Lorem Ipsum<hr/></h1> <a class= "col-md-10 col-sm-3" href="Lorem Ipsum.html"> <img data-src="holder.js/200x200" class="img-responsive mx-auto d-block shadow" alt="200x200" src="Lorem Ipsum.png" data-holder-rendered="true"> </a> <a class= "col-md-10 col-sm-3" href="Lorem Ipsum.html"> <img data-src="holder.js/200x200" class="img-responsive mx-auto d-block shadow" alt="200x200" src="Lorem Ipsum.png"data-holder-rendered="true"> </a> <a class= "col-md-10 col-sm-3" href="Lorem Ipsum.html"> <img data-src="holder.js/200x200" class="img-responsive mx-auto d-block shadow" alt="200x200" src="Lorem Ipsum.png"data-holder-rendered="true"> </a> <a class= "col-md-10 col-sm-3" href="Lorem Ipsum.html"> <img data-src="holder.js/200x200" class="img-responsive mx-auto d-block shadow" alt="200x200" src="Lorem Ipsum.png" data-holder-rendered="true"> </a> </div> </div> </div> <!-- Optional JavaScript --> <!-- jQuery first, then Popper.js, then Bootstrap JS --> <script src="https://code.jquery.com/jquery-3.2.1.slim.min.js" integrity="sha384-KJ3o2DKtIkvYIK3UENzmM7KCkRr/rE9/Qpg6aAZGJwFDMVNA/GpGFF93hXpG5KkN" crossorigin="anonymous"></script> <script src="https://cdnjs.cloudflare.com/ajax/libs/popper.js/1.12.9/umd/popper.min.js" integrity="sha384-ApNbgh9B+Y1QKtv3Rn7W3mgPxhU9K/ScQsAP7hUibX39j7fakFPskvXusvfa0b4Q" crossorigin="anonymous"></script> <script src="https://maxcdn.bootstrapcdn.com/bootstrap/4.0.0/js/bootstrap.min.js" integrity="sha384-JZR6Spejh4U02d8jOt6vLEHfe/JQGiRRSQQxSfFWpi1MquVdAyjUar5+76PVCmYl" crossorigin="anonymous"></script> </body> 删除标点符号和数字,以成为<U+200B>Chandler。这是我目前正在尝试的方法:

Chandler

但是,更改“ df”中“ city”列中的单元格无济于事。当我搜索typeof(df)时,我得到“列表”。这可能与它有关吗?

任何帮助将不胜感激。

1 个答案:

答案 0 :(得分:0)

第二个问题,tyepof()将始终为数据帧返回list,因为data frames are really just lists of equal length vectors

对于第一个问题,似乎您的数据中包含一些Unicode编码的字符。照顾这些的一种好方法是将它们转换,例如:

df$city <- iconv(df$city, 'utf-8', 'ascii', sub = '')

还可以gsub十六进制代码中的字符,如下所示:

df$city <- gsub('\u200B', '', df$city)

甚至是范围:

df$city <- gsub('[\u2000-\u20ff]', '', df$city)

但实际上我认为iconv方法是必经之路。在这种用法中,它只会删除角色而不是渲染它,但这似乎就是您想要的。