每个新行的R gsub增量

时间:2014-11-26 22:53:26

标签: regex r gsub

我有一个说10行的文本文件。我需要追加" n:"并使用gsub增加每行的值。我使用R.

这是我尝试过的。

tst <- readLines("test.json")
fix <- gsub("*}$", paste0(",\"n\":\"",1:10,"\"}"), tst)

我知道这个问题是什么。它正在搜索模式并为每个模式添加n:1,但每行只匹配一个模式,因此没有增量。例如,它产生以下内容。请注意n的价值。

请注意,该文本采用EDIT中所述的json格式。

现在,让我们假设a =第一行中的json文本,b是第二行中的json格式,上面的gsub给了我以下输出。

a, n:1
b, n:1
c, n:1 .... 

我需要的是:

a, n:1
b, n:2
c, n:3
... 
and so on.. 

我该怎么做?

编辑:

tst <- "{\"text\":\"Call the first precinct at 212-334-0611 and demand they hurry up and clear out the riff-raff #OWS\",\"location\":{\"lng\":-77.047203,\"lat\":39.4170472},\"geoflag\":false,\"screen_name_lower\":3696254146976399714,\"entities\":{\"hashtags\":[{\"text\":\"#ows\"}],\"user_mentions\":[{\"screen_name\":\"5703714229808319021\"}]},\"timestamp\":1321340409000,\"id\":136337675450978305,\"source\":\"<a href=http://www.hootsuite.com rel=nofollow>HootSuite<\\/a>\",\"user\":{\"location\":\"WY and Washington DC\",\"screen_name\":3696254146976399714}}"

编辑2:

这是我的实际推文。我只贴了6条推文。

{"text":"RT @2141912560879618632: Impeach Scalia and Thomas.  Clarence Thomas also a tax fraud.  #p2 http://t.co/SuzLE7kZ","location":{"lng":0,"lat":0},"geoflag":false,"screen_name_lower":8344340767467600327,"entities":{"urls":[{"expanded_url":"http://www.latimes.com/news/politics/la-pn-scalia-thomas-20111114","url":"http://www.latimes.com/news/politics/la-pn-scalia-thomas-20111114"}],"hashtags":[{"text":"#p2"}],"user_mentions":[{"screen_name":"2141912560879618632"}]},"timestamp":1321340401000,"id":136337643519737856,"source":"<a href=http://ubersocial.com rel=nofollow>UberSocial for BlackBerry<\/a>","user":{"location":"wrong place at the wrong time","screen_name":8344340767467600327}}
{"text":"RT @6822250609460363149: Main Media Helicopters blocked from roads transit and airspace in NY. #ows","location":{"lng":-80.0110840074,"lat":40.4230666861},"geoflag":false,"screen_name_lower":3864938046499739811,"entities":{"hashtags":[{"text":"#ows"}],"user_mentions":[{"screen_name":5703714229808319021}]},"timestamp":1321340402000,"id":136337644983566337,"source":"<a href=http://www.tweetdeck.com rel=nofollow>TweetDeck<\/a>","user":{"location":"Southern California","screen_name":3864938046499739811}}
{"text":"RT @4872494631597194689: A handful of protesters seem to be holding their ground in the middle of of the square, where the food tent is. #OWS","location":{"lng":-75.6681744492,"lat":42.9684876327},"geoflag":false,"screen_name_lower":3155607190500421639,"entities":{"hashtags":[{"text":"#ows"}],"user_mentions":[{"screen_name":"5703714229808319021"}]},"timestamp":1321340402000,"id":136337647592415232,"source":"<a href=http://twitter.com/#!/download/iphone rel=nofollow>Twitter for iPhone<\/a>","user":{"location":"New York","screen_name":3155607190500421639}}
{"text":"RT @5710636393838980539: Photo of Long Range Acoustic Device (LRAD) being staged near Zucotti Park http://t.co/ecKuyTno #OccupyWallSt #OccupyB ...","location":{"lng":0,"lat":0},"geoflag":false,"screen_name_lower":81153019260783000,"entities":{"urls":[{"expanded_url":"http://twitpic.com/7eebr3","url":"http://twitpic.com/7eebr3"}],"hashtags":[{"text":"#occupywallst"},{"text":"#occupyboston"}],"user_mentions":[{"screen_name":"5703714229808319021"}]},"timestamp":1321340403000,"id":136337651665076225,"source":"<a href=http://www.tweetdeck.com rel=nofollow>TweetDeck<\/a>","user":{"location":"null","screen_name":81153019260783000}}
{"text":"RT @8527126922837269423: Here's the link that works to watch the livestream of the NYC police raid of #ows: http://t.co/S71XWGNL","location":{"lng":0,"lat":0},"geoflag":false,"screen_name_lower":171283756943800599,"entities":{"urls":[{"expanded_url":"http://bit.ly/v1TyPW","url":"http://bit.ly/v1TyPW"}],"hashtags":[{"text":"#ows"}],"user_mentions":[{"screen_name":"5703714229808319021"}]},"timestamp":1321340404000,"id":136337656387866624,"source":"web","user":{"location":"flipadelphia","screen_name":171283756943800599}}
{"text":"RT @7526888406962725238: The police have blocked all entrances. They are not allowing press in. #ows","location":{"lng":9.6542972,"lat":45.3547433},"geoflag":false,"screen_name_lower":8941040531398533941,"entities":{"hashtags":[{"text":"#ows"}],"user_mentions":[{"screen_name":"5703714229808319021"}]},"timestamp":1321340405000,"id":136337660120805376,"source":"web","user":{"location":"NYC","screen_name":8941040531398533941}}

如果你将这些json保存在一个文件中,然后使用R中的readLines读取它,那么应用上面的函数,然后它会对你有意义,希望它附加(按照GSee)n:1在每个json的末尾,但不会为每个新行增加n的值。我发布的答案就是我所需要的。

编辑3: 是的,我不知道为什么我没有发布所有推文的问题。我猜我只是在偷懒。对不起,混乱的人。谢谢大家的帮助。同样,如果有人有更好的解决方案,我真的很感激

3 个答案:

答案 0 :(得分:2)

与您的问题一样令人困惑,以下内容对您有用。

paste0(sub('}$', ',\"n\":\"', tst), 1:length(tst), '\"}')

答案 1 :(得分:0)

只需应用要附加到gsub替换参数的整数列表:

lapply(1:10, function(x) gsub("*}$", paste0(",\"n\":\"", x, "\"}"), tst))

答案 2 :(得分:0)

谢谢大家的建议。我刚刚意识到我可以在每条推文上使用for循环。

这就是我所做的:

tweet.tst <- readLines("ows.json")
tweet.test <- character()

for(i in 1:length(tweet.tst)){
  tweet.test[i] <- gsub("*}$", paste0(",\"n\":\"",i,"\"}"), tweet.tst[i])
}

如果有人有更好的解决方案,那就非常受欢迎了。