如何通过ID给出带有条件的序列号

时间:2019-03-05 07:20:31

标签: r data.table conditional-statements seq id

我尝试给出相同的seq_number,直到type$AA出现ID

我尝试了

dt_1[seq:=seq(.N),by=c("ID","type")] 

但是它不起作用。有什么办法可以给这样的seq吗?

dt_1<-fread("ID    type
         1        AA
         1        B
         1        C
         1        D
         1        AA
         1        B
         1        D
         1        AA
         1        C
         2        AA
         2        C
         2        F
         2        D
         3        AA
         3        E
         3        C")


dt_2<-fread("ID    type   seq
         1        AA     1
          1        B     1
          1        C     1
          1        D     1
          1        AA    2
          1        B     2
          1        D     2
          1        AA    3
          1        C     3
          2        AA    1
          2        C     1
          2        F     1
          2        D     1
          3        AA    1
          3        E     1
          3        C     1")

2 个答案:

答案 0 :(得分:3)

使用import bs4 as bs import urllib import urllib.request import pandas as pd draft2018 ="https://en.wikipedia.org/wiki/2018_NBA_draft" draftpage =urllib.request.urlopen(draft2018) soup=bs.BeautifulSoup(draftpage,"html.parser") columns = ['Round', 'Pick', 'Player', 'Position', 'Nationality', 'Team', 'School/club team'] df = pd.DataFrame(columns=columns) table = soup.find("table",{"class":"wikitable sortable plainrowheaders"}).tbody trs = table.find_all("tr") for tr in trs: tds = tr.find_all('td') row = [td.text.replace('\n','') for td in tds] df = df.append(pd.Series(row, index=columns), ignore_index=True)

的data.table方法
rowidv()

从帮助文件: dt_1[, seq := rowidv( dt_1, cols= c( "ID", "type" ) ) ][] # ID type seq # 1: 1 AA 1 # 2: 1 B 1 # 3: 1 C 1 # 4: 1 D 1 # 5: 1 AA 2 # 6: 1 B 2 # 7: 1 D 2 # 8: 1 AA 3 # 9: 1 C 2 # 10: 2 AA 1 # 11: 2 C 1 # 12: 2 F 1 # 13: 2 D 1 # 14: 3 AA 1 # 15: 3 E 1 # 16: 3 C 1 等效于代码rowidv(DT, cols=c("x", "y"))中的N列。

答案 1 :(得分:1)

一种dplyr的方式:

> dt_1 %>%
+   group_by(ID) %>%
+   mutate(seq = cumsum(type == "AA"))
# A tibble: 16 x 3
# Groups:   ID [3]
      ID type    seq
   <int> <chr> <dbl>
 1     1 AA        1
 2     1 B         1
 3     1 C         1
 4     1 D         1
 5     1 AA        2
 6     1 B         2
 7     1 D         2
 8     1 AA        3
 9     1 C         3
10     2 AA        1
11     2 C         1
12     2 F         1
13     2 D         1
14     3 AA        1
15     3 E         1
16     3 C         1