合并2数据帧时重复的行

时间:2016-06-08 02:23:27

标签: r merge duplicates

这是Fips数据集

   State Fips State.Abbreviation ANSI.Code        GU.Name
1      1   67                 AL   2403054      Abbeville
2      1   73                 AL   2403063     Adamsville
3      1  117                 AL   2403069      Alabaster
4      1   95                 AL   2403074    Albertville
5      1  123                 AL   2403077 Alexander City
6      1  107                 AL   2403080     Aliceville
7      1   39                 AL   2403097      Andalusia
8      1   15                 AL   2403101       Anniston
:
:
:
41774    51  720                 VA   1498434           Norton
41775    51  730                 VA   1498435       Petersburg
41776    51  735                 VA   1498436         Poquoson
41777    51  740                 VA   1498556       Portsmouth
41778    51  750                 VA   1498438          Radford
41779    51  760                 VA   1789073         Richmond
41780    51  770                 VA   1498439          Roanoke
41781    51  775                 VA   1789074            Salem
41782    51  790                 VA   1789075         Staunton
41783    51  800                 VA   1498560          Suffolk
41784    51  810                 VA   1498559   Virginia Beach
41785    51  820                 VA   1498443       Waynesboro
41786    51  830                 VA   1789076     Williamsburg
41787    51  840                 VA   1789077       Winchester

dim(fips)
[1] 2937    5

这是数据头癌

   PUBCSNUM  REG MAR_STAT RACE1V NHIADE SEX FIPS Fips State State.Abbreviation
1  93261752 1544        2     15      0   1    3    3    34                 NY
2  93264865 1544        2      1      0   1   15   15    34                 NY
3  93268186 1544        2      1      0   1    5    5    34                 NY
4  93272027 1544        2      1      0   2   17   17    34                 NY
5  93274555 1544        1      1      0   1   13   13    34                 NY
6  93275343 1544        5      1      0   2   25   25    34                 NY
7  93279759 1544        5      1      0   2    9    9    34                 NY
8  93280754 1544        2      1      0   2   35   35    34                 NY
9  93281166 1544        2      1      0   2   31   31    34                 NY
10 93282602 1544        5      1      0   1   33   33    34                 NY
11 93287646 1544        1      1      0   1   11   11    34                 NY
12 93288255 1544        4      1      4   1   39   39    34                 NY
13 93290660 1544        9      1      0   2   25   25    34                 NY
14 93291461 1544        1      1      6   1   39   39    34                 NY
15 93291778 1544        2      1      0   1    3    3    34                 NY

dim(headcancer)
[1] 75313    10

当我合并在一起时,我希望使用head.cancer 75313行获得相同的行,但是我有951423行。

这是我的代码和输出

n = merge(head.cancer,fips, by=c('State','Fips','State.Abbreviation'), all.x= TRUE)

   State Fips State.Abbreviation PUBCSNUM  REG MAR_STAT RACE1V NHIADE SEX FIPS ANSI.Code            GU.Name
1      6    5                 CA 70128269 1541        4      1      0   2    5   2409693        Amador City
2      6    5                 CA 70128269 1541        4      1      0   2    5   2411446           Plymouth
3      6    5                 CA 70128269 1541        4      1      0   2    5    226085            Jackson
4      6    5                 CA 70128269 1541        4      1      0   2    5   1675841             Amador
5      6    5                 CA 70128269 1541        4      1      0   2    5   2418631 Ione Band of Miwok
6      6    5                 CA 70128269 1541        4      1      0   2    5   2412019       Sutter Creek
7      6    5                 CA 70128269 1541        4      1      0   2    5   2410110               Ione
8      6    5                 CA 70128269 1541        4      1      0   2    5   2410128            Jackson
9      6    5                 CA 67476209 1541        2      1      1   2    5   2409693        Amador City
10     6    5                 CA 67476209 1541        2      1      1   2    5   2411446           Plymouth
11     6    5                 CA 67476209 1541        2      1      1   2    5    226085            Jackson
12     6    5                 CA 67476209 1541        2      1      1   2    5   1675841             Amador
13     6    5                 CA 67476209 1541        2      1      1   2    5   2418631 Ione Band of Miwok
14     6    5                 CA 67476209 1541        2      1      1   2    5   2412019       Sutter Creek
15     6    5                 CA 67476209 1541        2      1      1   2    5   2410110               Ione
16     6    5                 CA 67476209 1541        2      1      1   2    5   2410128            Jackson
17     6    5                 CA 56544761 1541        4      1      0   2    5   2409693        Amador City
18     6    5                 CA 56544761 1541        4      1      0   2    5   2411446           Plymouth
19     6    5                 CA 56544761 1541        4      1      0   2    5    226085            Jackson
20     6    5                 CA 56544761 1541        4      1      0   2    5   1675841             Amador

dim(n)
[1] 951423     12

第一行到第8行“PUBCSNUM”重复8次,“PUBCSNUM”是ID,所以它是唯一的,“ANSI.Code”只有1个值,现在它们有很多值。我不知道为什么它像那样复制

请帮助我,我坚持了几个小时,但我无法理解。感谢

0 个答案:

没有答案