如何从r中的以下段落中获取所有名称

时间:2018-03-18 16:07:34

标签: r

任何人都可以帮助我从以下段落中获取所有姓名(姓名和父亲/丈夫/母亲姓名)。 Sample file: data extracted from this scan document

$new_array = []; foreach ($this->productdetails as $data) { $new_array[$data['name']] = $data['value']; }

输出应如下:(我们需要从Name / Father's / Mother's / Husband's中提取名称以构建Name列)

**姓名**
    Nagesh V
    Savitha
    A T Vitalrao     Le Venkatappa

1 个答案:

答案 0 :(得分:0)

我假设图像格式在数据样本中的表示方式是一致的。它是以一种看起来像你只是想找到名字而不是关系的方式写的。但是我保持了关系,以防你需要它们。

        txt<-"Name ; Nagesh  V Name ; Savitha Name ; A T Vitalrao\n\nFather's Le Venkatappa Father's Srinivas Father's Thirumagandam\nName: Name: Name:\n\nHouse No.:9/1 House No.:9/C House No.:9/C\n\nAge: 60 Sex: Male Age: 28 Sex: Female Age: 85 Sex: Male\nBCW1799964 BCW1797224 SOH0004515\nName : V Kedarnath Name : K Nalini Name : Sayiraj\n\nFather's Vital rao Husband's V Kedarnath Father's Rudrappa\n\nName: Name: Name:\n\nHouse No.:11 House No.:11 House No.:71\n\nAge: 55 Sex: Male Age: 47 Sex: Female Age: 36 Sex: Male\nSOH4703575 SOH4715249 SOH4703534\nName ; G.Dayala Murthy Name ; G.Anjali Name ; Tamil Selvi\n\nFather's K.Govinda Swamy Husband's K.Govinda Swamy Father's Govinda Swamy\nName: Name: Name:\n\nHouse No.:3 House No.:3 House No.:3\n\nAge: 28 Sex: Male Age: 48 Sex: Female Age: 21 Sex: Female\nSOH4703583 SOH4475547 SOH4475521\nName ; K.Govinda Swamy Name ; Rony Mazumder Name ; Bina Mazumder\nFather's Kuppuswamy Father's SAMIR MAZUMDER Husband's SAMIR MAZUMDER\nName: Name: Name:\n\nHouse No.:3 House No.:3/1 House No.:3/1\n\nAge: 60 Sex: Male Age: 29 Sex: Male Age: 52 Sex: Female\nSOH4476115 SOH4476164 SOH4476198\nName ; Priyanka Mmazumder Name ; Puja Mazumder Name ; Samir Mazumder\nFather's SAMIR MAZUMDER Mother's SAMIR MAZUMDER Father's MANINDRA LAL\nName: Name: Name: MAZUMDER\n\nHouse No.:3/1 House No.:3/1 House No.:3/1\n\nAge:"
    data_input<-readLines(textConnection(txt))

    library(dplyr)
# The Bulk of this is just data cleaning and manipulation
# liberal use of the magrittr/dplyr pipe because all the gsubs are a messy

    Output<-grep("^Name|^Father|^Mother|^Husband",data_input,value=T) %>% 
      grep("Name: Name: Name:",.,invert=T,value=T) %>% 
      gsub("Name","Self",.) %>% 
      gsub("'s","'s ;",.) %>% 
      gsub(" Self"," ; Self",.) %>% 
      gsub(" Husband's"," ;  Husband's", .) %>% 
      gsub(" Mother's"," ; Mother's",.) %>% 
      gsub(" Father's"," ; Father's",.) %>% 
      gsub("'s","",.) %>% 
      gsub(":",";",.) %>% 
      gsub(" ","",.) %>% 
      data_frame(strings=.) %>%
   # from the provided image it appeared that every 2 consecutive lines
   # are connected, so the index is just a way to split them 
      mutate(.,
             index=unlist(sapply(1:(NROW(.)/2),rep,2,simplify=F))
               ) %>% 
    {split(.$strings,.$index)} %>%
      lapply(., function(u) {
        data_frame(
          Name = grep(
            "Self",
            unlist(strsplit(u[1], split = ";")),
            invert = T,
            value = T
          ),
          Relation = grep(
            c("Mother|Father|Husband"),
            unlist(strsplit(u[2], split = ";")),
            invert = F,
            value = T
          ),
          Relative = grep(
            c("Mother|Father|Husband"),
            unlist(strsplit(u[2], split = ";")),
            invert = T,
            value = T
          )
        )
      }) %>% 
      bind_rows()
# Output

    # A tibble: 15 x 3
                    Name Relation       Relative
                   <chr>    <chr>          <chr>
     1           NageshV   Father   LeVenkatappa
     2           Savitha   Father       Srinivas
     3        ATVitalrao   Father  Thirumagandam
     4        VKedarnath   Father       Vitalrao
     5           KNalini  Husband     VKedarnath
     6           Sayiraj   Father       Rudrappa
     7    G.DayalaMurthy   Father K.GovindaSwamy
     8          G.Anjali  Husband K.GovindaSwamy
     9        TamilSelvi   Father   GovindaSwamy
    10    K.GovindaSwamy   Father     Kuppuswamy
    11      RonyMazumder   Father  SAMIRMAZUMDER
    12      BinaMazumder  Husband  SAMIRMAZUMDER
    13 PriyankaMmazumder   Father  SAMIRMAZUMDER
    14      PujaMazumder   Mother  SAMIRMAZUMDER
    15     SamirMazumder   Father    MANINDRALAL