删除R中相同数据框列中的重复/重复值

时间:2017-02-11 18:58:30

标签: r

我有一个奇怪的数据框,其中Player列有玩家的名字。问题是名字显示两次。所以Roy SieversRoyRoy Sievers,我希望名称显然是Roy Sievers

有人知道怎么做吗?

这是完整的数据框,不是很长:

    Year                   Player                  Team       Position
1   1949           RoyRoy Sievers       St. Louis Browns       OF
2   1950           WaltWalt Dropo         Boston Red Sox       1B
3   1951         GilGil McDougald       New York Yankees       3B
4   1952          HarryHarry Byrd Philadelphia Athletics        P
5   1953       HarveyHarvey Kuenn         Detroit Tigers       SS
6   1954              BobBob Grim       New York Yankees        P
7   1955           HerbHerb Score      Cleveland Indians        P
8   1956        LuisLuis Aparicio      Chicago White Sox       SS
9   1957           TonyTony Kubek       New York Yankees       SS
10  1958       AlbieAlbie Pearson    Washington Senators       OF
11  1959           BobBob Allison    Washington Senators       OF
12  1960            RonRon Hansen      Baltimore Orioles       SS
13  1961           DonDon Schwall         Boston Red Sox        P
14  1962             TomTom Tresh       New York Yankees       SS
15  1963          GaryGary Peters      Chicago White Sox        P
16  1964           TonyTony Oliva        Minnesota Twins       OF
17  1965         CurtCurt Blefary      Baltimore Orioles       OF
18  1966        TommieTommie Agee      Chicago White Sox       OF
19  1967             RodRod Carew        Minnesota Twins       2B
20  1968         StanStan Bahnsen       New York Yankees        P
21  1969          LouLou Piniella     Kansas City Royals       OF
22  1970    ThurmanThurman Munson       New York Yankees        C
23  1971     ChrisChris Chambliss      Cleveland Indians       1B
24  1972      CarltonCarlton Fisk         Boston Red Sox        C
25  1973              AlAl Bumbry      Baltimore Orioles       OF
26  1974        MikeMike Hargrove          Texas Rangers       1B
27  1975            FredFred Lynn         Boston Red Sox       OF
28  1976         MarkMark Fidrych         Detroit Tigers        P
29  1977        EddieEddie Murray      Baltimore Orioles       DH
30  1978          LouLou Whitaker         Detroit Tigers       2B
31 1979*         JohnJohn Castino        Minnesota Twins       3B
32 1979*   AlfredoAlfredo Griffin      Toronto Blue Jays       SS
33  1980        JoeJoe Charboneau      Cleveland Indians       OF
34  1981        DaveDave Righetti       New York Yankees        P
35  1982            CalCal Ripken      Baltimore Orioles       SS
36  1983            RonRon Kittle      Chicago White Sox       OF
37  1984         AlvinAlvin Davis       Seattle Mariners       1B
38  1985       OzzieOzzie Guillén      Chicago White Sox       SS
39  1986         JoseJose Canseco      Oakland Athletics       OF
40  1987         MarkMark McGwire      Oakland Athletics       1B
41  1988           WaltWalt Weiss      Oakland Athletics       SS
42  1989         GreggGregg Olson      Baltimore Orioles        P
43  1990          Sandy Alomar Jr      Cleveland Indians        C
44  1991     ChuckChuck Knoblauch        Minnesota Twins       2B
45  1992           PatPat Listach      Milwaukee Brewers       SS
46  1993            TimTim Salmon      California Angels       OF
47  1994           BobBob Hamelin     Kansas City Royals       DH
48  1995       MartyMarty Cordova        Minnesota Twins       OF
49  1996         DerekDerek Jeter       New York Yankees       SS
50  1997   NomarNomar Garciaparra         Boston Red Sox       SS
51  1998            BenBen Grieve      Oakland Athletics       OF
52  1999     CarlosCarlos Beltrán     Kansas City Royals       OF
53  2000  KazuhiroKazuhiro Sasaki       Seattle Mariners        P
54  2001      IchiroIchiro Suzuki       Seattle Mariners       OF
55  2002          EricEric Hinske      Toronto Blue Jays       3B
56  2003        ÁngelÁngel Berroa     Kansas City Royals       SS
57  2004        BobbyBobby Crosby      Oakland Athletics       SS
58  2005      HustonHuston Street      Oakland Athletics        P
59  2006   JustinJustin Verlander         Detroit Tigers        P
60  2007     DustinDustin Pedroia         Boston Red Sox       2B
61  2008        EvanEvan Longoria         Tampa Bay Rays       3B
62  2009            Andrew Bailey      Oakland Athletics        P
63  2010     NeftalíNeftalí Feliz          Texas Rangers        P
64  2011  JeremyJeremy Hellickson         Tampa Bay Rays        P
65  2012           MikeMike Trout     Los Angeles Angels       OF
66  2013             WilWil Myers         Tampa Bay Rays       OF
67  2014           JoséJosé Abreu      Chicago White Sox       1B
68  2015      CarlosCarlos Correa         Houston Astros       SS
69  2016    MichaelMichael Fulmer         Detroit Tigers        P

3 个答案:

答案 0 :(得分:3)

您可以通过查找至少三个字母的重复模式并将其替换为一个副本来解决此问题:

gsub("(\\w{3,})\\1", "\\1", Players$Player)

如果您想覆盖旧版本,只需

Players$Player = gsub("(\\w{3,})\\1", "\\1", Players$Player)

答案 1 :(得分:2)

G5W的回答让你大部分都在那里,但会错过像“Al”这样的两个字母的名字。此版本依赖于大小写,而不是字符数:

l = [[1, 2], [3, -4], [-5, 6, 7]]
c = [x[::-1] for x in l[::-1]] # [[7, 6, -5], [-4, 3], [2, 1]]

答案 2 :(得分:1)

对于不那么正则表达式的精明---

 library(stringr)
    fun1<-function(string){
      g<-str_split(g," ")
      h<-str_length(m<-g[[1]][1])
      l<-str_sub(m,start = 1,end = h/2)
      return(paste(l,g[[1]][2]))
    }

 fun1(df$Player)