任何人都可以帮助我从以下段落中获取所有姓名(姓名和父亲/丈夫/母亲姓名)。 Sample file: data extracted from this scan document
$new_array = [];
foreach ($this->productdetails as $data) {
$new_array[$data['name']] = $data['value'];
}
输出应如下:(我们需要从Name / Father's / Mother's / Husband's中提取名称以构建Name列)
**姓名**
Nagesh V
Savitha
A T Vitalrao
Le Venkatappa
答案 0 :(得分:0)
我假设图像格式在数据样本中的表示方式是一致的。它是以一种看起来像你只是想找到名字而不是关系的方式写的。但是我保持了关系,以防你需要它们。
txt<-"Name ; Nagesh V Name ; Savitha Name ; A T Vitalrao\n\nFather's Le Venkatappa Father's Srinivas Father's Thirumagandam\nName: Name: Name:\n\nHouse No.:9/1 House No.:9/C House No.:9/C\n\nAge: 60 Sex: Male Age: 28 Sex: Female Age: 85 Sex: Male\nBCW1799964 BCW1797224 SOH0004515\nName : V Kedarnath Name : K Nalini Name : Sayiraj\n\nFather's Vital rao Husband's V Kedarnath Father's Rudrappa\n\nName: Name: Name:\n\nHouse No.:11 House No.:11 House No.:71\n\nAge: 55 Sex: Male Age: 47 Sex: Female Age: 36 Sex: Male\nSOH4703575 SOH4715249 SOH4703534\nName ; G.Dayala Murthy Name ; G.Anjali Name ; Tamil Selvi\n\nFather's K.Govinda Swamy Husband's K.Govinda Swamy Father's Govinda Swamy\nName: Name: Name:\n\nHouse No.:3 House No.:3 House No.:3\n\nAge: 28 Sex: Male Age: 48 Sex: Female Age: 21 Sex: Female\nSOH4703583 SOH4475547 SOH4475521\nName ; K.Govinda Swamy Name ; Rony Mazumder Name ; Bina Mazumder\nFather's Kuppuswamy Father's SAMIR MAZUMDER Husband's SAMIR MAZUMDER\nName: Name: Name:\n\nHouse No.:3 House No.:3/1 House No.:3/1\n\nAge: 60 Sex: Male Age: 29 Sex: Male Age: 52 Sex: Female\nSOH4476115 SOH4476164 SOH4476198\nName ; Priyanka Mmazumder Name ; Puja Mazumder Name ; Samir Mazumder\nFather's SAMIR MAZUMDER Mother's SAMIR MAZUMDER Father's MANINDRA LAL\nName: Name: Name: MAZUMDER\n\nHouse No.:3/1 House No.:3/1 House No.:3/1\n\nAge:"
data_input<-readLines(textConnection(txt))
library(dplyr)
# The Bulk of this is just data cleaning and manipulation
# liberal use of the magrittr/dplyr pipe because all the gsubs are a messy
Output<-grep("^Name|^Father|^Mother|^Husband",data_input,value=T) %>%
grep("Name: Name: Name:",.,invert=T,value=T) %>%
gsub("Name","Self",.) %>%
gsub("'s","'s ;",.) %>%
gsub(" Self"," ; Self",.) %>%
gsub(" Husband's"," ; Husband's", .) %>%
gsub(" Mother's"," ; Mother's",.) %>%
gsub(" Father's"," ; Father's",.) %>%
gsub("'s","",.) %>%
gsub(":",";",.) %>%
gsub(" ","",.) %>%
data_frame(strings=.) %>%
# from the provided image it appeared that every 2 consecutive lines
# are connected, so the index is just a way to split them
mutate(.,
index=unlist(sapply(1:(NROW(.)/2),rep,2,simplify=F))
) %>%
{split(.$strings,.$index)} %>%
lapply(., function(u) {
data_frame(
Name = grep(
"Self",
unlist(strsplit(u[1], split = ";")),
invert = T,
value = T
),
Relation = grep(
c("Mother|Father|Husband"),
unlist(strsplit(u[2], split = ";")),
invert = F,
value = T
),
Relative = grep(
c("Mother|Father|Husband"),
unlist(strsplit(u[2], split = ";")),
invert = T,
value = T
)
)
}) %>%
bind_rows()
# Output
# A tibble: 15 x 3
Name Relation Relative
<chr> <chr> <chr>
1 NageshV Father LeVenkatappa
2 Savitha Father Srinivas
3 ATVitalrao Father Thirumagandam
4 VKedarnath Father Vitalrao
5 KNalini Husband VKedarnath
6 Sayiraj Father Rudrappa
7 G.DayalaMurthy Father K.GovindaSwamy
8 G.Anjali Husband K.GovindaSwamy
9 TamilSelvi Father GovindaSwamy
10 K.GovindaSwamy Father Kuppuswamy
11 RonyMazumder Father SAMIRMAZUMDER
12 BinaMazumder Husband SAMIRMAZUMDER
13 PriyankaMmazumder Father SAMIRMAZUMDER
14 PujaMazumder Mother SAMIRMAZUMDER
15 SamirMazumder Father MANINDRALAL