在特定列上使用唯一功能

时间:2019-01-27 11:22:38

标签: r duplicates unique

我有一个包含Twitter数据的数据框,其中第一条(text)中的推文消息,第二列(retweetCount)中的转发数。我想删除重复鸣叫消息的行。

过去,我曾使用unique函数从数据框中删除重复的观察值。像这样df_no_duplicates <- unique(df)。但是对于我的Twitter数据,这只会删除精确的text和精确的retweetCount所在的行。我可以指定unique函数只在text列上工作吗?如果可能的话,我还要使用以下逻辑进一步指定该函数:如果在数据帧中重复text,则仅保留观察值最大的retweetCount

这是我的数据的可复制样本(尽管我不确定前50行是否有重复消息)

dput(head(df, 50))

structure(list(text = c("as always making sense of it all for us ive never felt less welcome in this country brexit  ", 
"never underestimate power of stupid people in a democracy brexit", 
"a quick guide to brexit and beyond after britain votes to quit eu  ", 
"this selfinflicted wound will be his legacy cameron falls on sword after brexit euref  ", 
"so the uk is out cameron resigned scotland wants to leave great britain sinn fein plans to unify ireland and its o", 
"this is a very good summary no biasspinagenda of the legal ramifications of the leave result brexit ", 
"you cant make this up cornwall votes out immediately pleads to keep eu cash this was never a rehearsal ", 
"no matter the outcome brexit polls demonstrate how quickly half of any population can be convinced to vote against itself q", 
"i wouldnt mind so much but the result is based on a pack of lies and unaccountable promises democracy didnt win brexit pro", 
"so the uk is out cameron resigned scotland wants to leave great britain sinn fein plans to unify ireland and its o", 
"absolutely brilliant poll on brexit by ", "think the brexit campaign relies on the same sort of logic that drpepper does whats the worst that can happen thingsthatarewellbrexit", 
"am baffled by nigel farages claim that brexit is a victory for real people as if the 47 voting remain are fucking smu", 
"not one of the uks problems has been solved by brexit vote migration inequality the uks centurylong decline as", 
"scotland should never leave eu  calls for new independence vote grow  brexit", 
"the most articulate take on brexit is actually this ft reader comment today ", 
"david cameron has said he is set to resign as british prime minister after uk votes to leave eu brexit ", 
"im laughing at people who voted for brexit but are complaining about the exchange rate affecting their holiday\r\nremain", 
"life is too short to wear boring shoes  brexit", "pm at buckingham palace for audience with the queen  brexit", 
"i hate people too but i dont think id vote for armageddon over it brexit", 
"text = when you send a message\r\n\r\nsext = when you send a sexy message\r\n\r\nbrexit = when you send an entire global economy to he", 
"i actually was pretty confident that the brits wouldnt vote for a brexit  didnt see this coming", 
"pm at buckingham palace for audience with the queen  brexit", 
"now just the time can say if it is the right decision brexit", 
"no matter the outcome brexit polls demonstrate how quickly half of any population can be convinced to vote against itself q", 
"that was whatever your view on brexit a superb speech hope next pm will be as good a statesman as david cameron ", 
"david cameron to step down as over 52pc of britains vote to leave the european union brexit", 
"between brexit and euro2016 england have got a few johnsons to worry about so heres a quick guideeurefresults ", 
"scotland voted overwhelmingly to remain in the eu  ", "brexit is great enough on the merits but watching the tears and tantrums is the icing on the cake ", 
"the nightmare has begun it will be a long one todays column on brexit ", 
"brexit why premier league clubs may be unable to sign foreign players under age of 18\r\n ", 
"brexit why premier league clubs may be unable to sign foreign players under age of 18\r\n ", 
"cant think about brexit without thinking about this ", "brexit likely to help rajoy win sundays election but could be nightmare for him if he gets to govern given economic fragil", 
"trump praises uk public for taking back control of country   brexit", 
"expert many feel globalisation isnt working for them yes mate thats the 999 of punters who it is not working for abc730 brexit", 
"cornwall votes against europe then expects to keep eu funding good luck with that ", 
"weve done it without a bullet being fired  nigel farage forgetting that a member of parliament was assassinated over b", 
"londoners call for capital to gain independence after brexit vote  ", 
"12 trump and brexit are direct results of pressure on working class when big companies bow down to", 
"just a reminder that the brexit newspapers were easily worth more than a 2 swing  none of the men who own them pay the", 
"i always loved gb  thought about moving there some day but the decision they made yesterday is really shocking  disa", 
"winter is coming gameofthrones brexit ", "the most articulate take on brexit is actually this ft reader comment today ", 
"aw\r\n\r\ni worry that the brexit thing will justaid tyrannys spread", 
"breaking brexit spain proposes shared sovereignty over gibraltar", 
"the entirety of scotland voted to remain you imbecile brexit ", 
"diane calling it right again \r\nthe dispossessed voted for brexit jeremy corbyn offers real change\r\nhttp"
), retweetCount = c(0, 251, 39, 0, 6462, 0, 1391, 31595, 15, 
6462, 20521, 0, 871, 10, 184, 1239, 143, 0, 0, 218, 0, 3482, 
0, 218, 0, 31595, 0, 25, 777, 14, 404, 6, 1, 0, 10756, 4, 198, 
0, 666, 12387, 609, 0, 237, 1, 0, 1239, 0, 2431, 6, 84)), .Names = c("text", 
"retweetCount"), row.names = c(NA, 50L), class = "data.frame")

1 个答案:

答案 0 :(得分:0)

reprex数据需要一点工作-但我认为通常可以使用dplyr中的tidyverse来工作:

library(tidyverse)

df2 <- df %>%
  group_by(text) %>%
  summarise(retweetCount = max(retweetCount)) %>%
  distinct()

我无法测试您的数据,因此可能不需要最后的distinct函数。