我使用了包装制表器中的extract_tables来提取165页表。每个页面都在大列表中格式化为自己的数据框。 PDF中的表格有5列。有些页面的格式不正确,只有4列。
我想将所有数据框组合成一个数据框,但是我不能,因为列号不同。
第五列是不必要的,所以我在修改map_if函数
map_if(df, ~.[,5], ~ select(-c(,5)))
但是那不起作用。
编辑: 为了简化问题,我将复制并粘贴输出数据的简化版本。
使用typeof()
,我的数据是一个列表,使用缩短后的数据集的length()
的长度为7。str()
返回以下值:
List of 7
$ : chr [1:34, 1:4] "Species" "Abelmoschus\t\r esculentus(\t\r L.)\t\r Moench" "Abelmoschus\t\r esculentus(\t\r L.)\t\r Moench" "Abelmoschus\t\r ficulneus(\t\r \t\r L.)\t\r Wight\t\r &\t\r Arn." ...
$ : chr [1:34, 1:4] "Species" "Abrus\t\r precatorius\t\r L." "Abrus\t\r precatorius\t\r L." "Abrus\t\r precatorius\t\r L." ...
$ : chr [1:34, 1:4] "Species" "Acanthocalyx\t\r alba(\t\r Hand.-Ââ\200\220Mazz.)\t\r M.J.Cannon" "Acanthus\t\r ilicifolius\t\r L." "Achillea\t\r millefolium\t\r L." ...
$ : chr [1:34, 1:4] "Species" "Achyranthes\t\r bidentata\t\r Blume" "Achyranthes\t\r bidentata\t\r Blume" "Achyranthes\t\r bidentata\t\r Blume" ...
$ : chr [1:34, 1:4] "Species" "Adhatoda\t\r vasica\t\r Nees" "Adhatoda\t\r vasica\t\r Nees" "Adhatoda\t\r vasica\t\r Nees" ...
$ : chr [1:34, 1:4] "Species" "Aganosma\t\r marginata(\t\r Roxb.)\t\r G.Don" "Aganosma\t\r marginata(\t\r Roxb.)\t\r G.Don" "Aganosma\t\r sp." ...
$ : chr [1:34, 1:5] "Species" "Ailanthus\t\r triphysa(\t\r Dennst.)\t\r Alston" "Ainsliaea\t\r \t\r spicata\t\r Vaniot" "Akebia\t\r quinata(\t\r Houtt.)\t\r Decne." ...
dput的输出(pdf.dat [1:2])
list(structure(c("Species", "Abelmoschus\t\r esculentus(\t\r L.)\t\r Moench",
"Abelmoschus\t\r esculentus(\t\r L.)\t\r Moench", "Abelmoschus\t\r ficulneus(\t\r \t\r L.)\t\r Wight\t\r &\t\r Arn.",
"Abelmoschus\t\r manihot(\t\r L.)\t\r Medik.", "Abelmoschus\t\r manihot(\t\r L.)\t\r Medik.",
"Abelmoschus\t\r manihot(\t\r L.)\t\r Medik.", "Abelmoschus\t\r manihot(\t\r L.)\t\r Medik.",
"Abelmoschus\t\r manihot(\t\r L.)\t\r Medik.", "Abelmoschus\t\r manihot(\t\r L.)\t\r Medik.",
"Abelmoschus\t\r manihot(\t\r L.)\t\r Medik.", "Abelmoschus\t\r manihot(\t\r L.)\t\r Medik.",
"Abelmoschus\t\r moschatus\t\r Medik.", "Abelmoschus\t\r moschatus\t\r Medik.",
"Abelmoschus\t\r sagittifolius(\t\r Kurz)\t\r Merr.", "Abelmoschus\t\r sagittifolius(\t\r Kurz)\t\r Merr.",
"Abroma\t\r augusta(\t\r L.)\t\r L.\t\r f.", "Abroma\t\r augusta(\t\r L.)\t\r L.\t\r f.",
"Abroma\t\r augusta(\t\r L.)\t\r L.\t\r f.", "Abroma\t\r augusta(\t\r L.)\t\r L.\t\r f.",
"Abroma\t\r augusta(\t\r L.)\t\r L.\t\r f.", "Abroma\t\r augusta(\t\r L.)\t\r L.\t\r f.",
"Abroma\t\r augusta(\t\r L.)\t\r L.\t\r f.", "Abroma\t\r augusta(\t\r L.)\t\r L.\t\r f.",
"Abrus\t\r precatorius\t\r L.", "Abrus\t\r precatorius\t\r L.",
"Abrus\t\r precatorius\t\r L.", "Abrus\t\r precatorius\t\r L.",
"Abrus\t\r precatorius\t\r L.", "Abrus\t\r precatorius\t\r L.",
"Abrus\t\r precatorius\t\r L.", "Abrus\t\r precatorius\t\r L.",
"Abrus\t\r precatorius\t\r L.", "Abrus\t\r precatorius\t\r L.",
"Family", "Malvaceae", "Malvaceae", "Malvaceae", "Malvaceae",
"Malvaceae", "Malvaceae", "Malvaceae", "Malvaceae", "Malvaceae",
"Malvaceae", "Malvaceae", "Malvaceae", "Malvaceae", "Malvaceae",
"Malvaceae", "Malvaceae", "Malvaceae", "Malvaceae", "Malvaceae",
"Malvaceae", "Malvaceae", "Malvaceae", "Malvaceae", "Fabaceae",
"Fabaceae", "Fabaceae", "Fabaceae", "Fabaceae", "Fabaceae", "Fabaceae",
"Fabaceae", "Fabaceae", "Fabaceae", "Use", "Hysteritis", "Blenorrhagia",
"Contraceptive", "Parturition", "Menorrhagia", "Parturition(\t\r difficult)",
"Female\t\r fertility", "Parturition(\t\r induces\t\r labour)",
"Lactagogue", "Blenorrhagia", "Postpartum\t\r recovery", "Gynaecological\t\r diseases",
"Lactagogue", "Blenorrhagia", "Leucorrhea", "Dysmenorrhea", "uterine\t\r diseases",
"Leucorrhea", "Menstrual\t\r disorders", "Amenorrhea", "Dysmenorrhea",
"Emmenagogue", "Dysmenorrhea", "Antifertility/prevent\t\r conception",
"Abortifacient", "Contraception", "Amenorrhegia", "Neonatal\t\r bath",
"Contraceptive", "Abortifacient", "Abortifacient", "Abortifacient",
"Abortifacient", "Use(\t\r standardized)\t\r Study", "Inflammation Kishore\t\r et\t\r al.(\t\r 1989)",
"Leucorrhea Pételot(\t\r 1952)", "Contraceptive Bhogaonkar\t\r and\t\r Kadam(\t\r 2011)",
"Other/NOS Bourdy\t\r and\t\r Walter(\t\r 1992)", "Uterine\t\r hemorrhage Bourdy\t\r and\t\r Walter(\t\r 1992)",
"Parturition\t\r Girard\t\r and\t\r Barrau(\t\r 1957)",
"Fertility Holdsworth(\t\r 1975)", "Uterine\t\r contractions(\t\r induce) Holdsworth(\t\r 1980)",
"Lactation(\t\r stimulate) Ishidoya(\t\r 1933-Ââ\200\2201937)",
"Leucorrhea Roi(\t\r 1955)", "Postpartum\t\r recovery Roosita\t\r et\t\r al.(\t\r 2008)",
"Gynecological\t\r disorders\t\r NOS Van\t\r Duong(\t\r 1993)",
"Lactation(\t\r stimulate) Zhang\t\r et\t\r al.(\t\r 2009)",
"Leucorrhea Pételot(\t\r 1952)", "Leucorrhea Pételot(\t\r 1952)",
"Menstrual\t\r pain Guerrero(\t\r 1922)", "Gynecological\t\r disorders\t\r NOS Hossan\t\r et\t\r al.(\t\r 2010)",
"Leucorrhea Hossan\t\r et\t\r al.(\t\r 2010)", "Menstrual\t\r disorders\t\r NOS Hossan\t\r et\t\r al.(\t\r 2010)",
"Menstrual\t\r flow(\t\r absent) Pardo\t\r de\t\r Tavera\t\r and\t\r Thomas(\t\r 1901)",
"Menstrual\t\r pain Pardo\t\r de\t\r Tavera\t\r and\t\r Thomas(\t\r 1901)",
"Menstrual\t\r flow(\t\r stimulate) Pételot(\t\r 1952)",
"Menstrual\t\r pain Quisumbing(\t\r 1951)", "Contraceptive Behera(\t\r 2006)",
"Abortion(\t\r induce) Bhattarai(\t\r 1994)", "Contraceptive Bhattarai(\t\r 1994)",
"Menstrual\t\r flow(\t\r absent) Bhogaonkar\t\r and\t\r Kadam(\t\r 2011)",
"Other/NOS Fox(\t\r 1953)", "Contraceptive Goswami\t\r et\t\r al.(\t\r 2011)",
"Abortion(\t\r induce) Guha\t\r et\t\r al.(\t\r 2003)", "Abortion(\t\r induce) Jain\t\r et\t\r al.(\t\r 2004)",
"Abortion(\t\r induce) Kalita\t\r et\t\r al.(\t\r 2011)",
"Abortion(\t\r induce) Kishore\t\r et\t\r al.(\t\r 1989)"
), .Dim = c(34L, 4L)), structure(c("Species", "Abrus\t\r precatorius\t\r L.",
"Abrus\t\r precatorius\t\r L.", "Abrus\t\r precatorius\t\r L.",
"Abrus\t\r precatorius\t\r L.", "Abrus\t\r precatorius\t\r L.",
"Abrus\t\r precatorius\t\r L.", "Abrus\t\r precatorius\t\r L.",
"Abrus\t\r precatorius\t\r L.", "Abrus\t\r precatorius\t\r L.",
"Abrus\t\r precatorius\t\r L.", "Abutilon\t\r indicum(\t\r \t\r L.)\t\r Sweet",
"Abutilon\t\r indicum(\t\r \t\r L.)\t\r Sweet", "Abutilon\t\r indicum(\t\r L.)\t\r Sweet",
"Abutilon\t\r indicum(\t\r L.)\t\r Sweet", "Acacia\t\r catechu(\t\r L.\t\r f.)\t\r Willd.",
"Acacia\t\r catechu(\t\r L.f.)\t\r Willd.", "Acacia\t\r concinna(\t\r Willd.)\t\r DC.",
"Acacia\t\r concinna(\t\r Willd.)\t\r DC.", "Acacia\t\r farnesiana(\t\r \t\r L.)\t\r Willd.",
"Acacia\t\r farnesiana(\t\r \t\r L.)\t\r Willd.", "Acacia\t\r farnesiana(\t\r \t\r L.)\t\r Willd.",
"Acacia\t\r farnesiana(\t\r L.)\t\r Willd.", "Acacia\t\r farnesiana(\t\r L.)\t\r Willd.",
"Acacia\t\r farnesiana(\t\r L.)\t\r Willd.", "Acacia\t\r leucophloeia(\t\r Roxb.)\t\r Willd.",
"Acacia\t\r leucophloeia(\t\r Roxb.)\t\r Willd.", "Acacia\t\r nilotica(\t\r L.)\t\r Delile",
"Acacia\t\r nilotica(\t\r L.)\t\r Delile", "Acacia\t\r nilotica(\t\r L.)\t\r Delile",
"Acalypha\t\r grandis\t\r Benth.", "Acalypha\t\r spiciflora\t\r Burm.f.",
"Acalypha\t\r spiciflora\t\r Burm.f.", "Acanthocalyx\t\r alba(\t\r Hand.-Ââ\200\220Mazz.)\t\r M.J.Cannon",
"Family", "Fabaceae", "Fabaceae", "Fabaceae", "Fabaceae", "Fabaceae",
"Fabaceae", "Fabaceae", "Fabaceae", "Fabaceae", "Fabaceae", "Malvaceae",
"Malvaceae", "Malvaceae", "Malvaceae", "Fabaceae", "Fabaceae",
"Fabaceae", "Fabaceae", "Fabaceae", "Fabaceae", "Fabaceae", "Fabaceae",
"Fabaceae", "Fabaceae", "Fabaceae", "Fabaceae", "Fabaceae", "Fabaceae",
"Fabaceae", "Euphorbiaceae", "Euphorbiaceae", "Euphorbiaceae",
"Caprifoliaceae", "Use", "Contraceptive", "Female\t\r fertility",
"Leucorrhea", "Abortifacient", "Contraceptive", "Antifertility",
"Postpartum\t\r recovery", "Contraceptive", "Abortifacient",
"menstrual\t\r disorders", "menstrual\t\r disorders", "Leucorrhea",
"Urinary\t\r tract\t\r infections", "Uterus\t\r displacement",
"Abortifacient", "Abortifacient", "Postpartum", "Postpartum",
"Leucorrhea", "Leucorrhea", "Menorrhagia", "Postpartum\t\r protective",
"Leucorrhea", "Gynaecological\t\r diseases", "Contraceptive",
"Amenorrhea", "Contraction\t\r of\t\r uterus\t\r in\t\r post-Ââ\200\220natal\t\r days",
"Menstrual\t\r pain\t\r relief", "Leucorrhea", "Contraceptive",
"postpartum\t\r anemia", "expel\t\r lochia", "Gynaecological\t\r diseases",
"Use(\t\r standardized)\t\r Study", "Contraceptive Pal\t\r and\t\r Jain(\t\r 1998),\t\r Lodha",
"Fertility Pal\t\r and\t\r Jain(\t\r 1998),\t\r Lodha", "Leucorrhea Pal\t\r and\t\r Jain(\t\r 1998),\t\r Lodha",
"Abortion(\t\r induce) Panduranga\t\r et\t\r al.(\t\r 2011)",
"Contraceptive Panduranga\t\r et\t\r al.(\t\r 2011)", "Contraceptive Priya\t\r et\t\r al.(\t\r 2002)",
"Postpartum\t\r recovery Roosita\t\r et\t\r al.(\t\r 2008)",
"Contraceptive Tripathi\t\r et\t\r al.(\t\r 2010)", "Abortion(\t\r induce) Van\t\r Duong(\t\r 1993)",
"Menstrual\t\r disorders\t\r NOS Vidyasagar\t\r and\t\r Prashantkumar(\t\r 2007)",
"Menstrual\t\r disorders\t\r NOS Panduranga\t\r et\t\r al.(\t\r 2011)",
"Leucorrhea Yadav\t\r et\t\r al.(\t\r 2006)", "Urinary\t\r tract\t\r infections Lecomte\t\r et\t\r al.(\t\r 1907)",
"Uterine\t\r prolapse Mohapatra\t\r and\t\r Sahoo(\t\r 2008)",
"Abortion(\t\r induce) Jain\t\r et\t\r al.(\t\r 2004)", "Abortion(\t\r induce) Bhattarai(\t\r 1994)",
"Other/NOS Anderson(\t\r 1993),\t\r Hmong", "Other/NOS Anderson(\t\r 1993),\t\r Karen",
"Leucorrhea Pételot(\t\r 1952)", "Leucorrhea Tripathi\t\r et\t\r al.(\t\r 2010)",
"Uterine\t\r hemorrhage Tripathi\t\r et\t\r al.(\t\r 2010)",
"Other/NOS Gimlette(\t\r 1930)", "Leucorrhea Pardo\t\r de\t\r Tavera\t\r and\t\r Thomas(\t\r 1901)",
"Gynecological\t\r disorders\t\r NOS Van\t\r Duong(\t\r 1993)",
"Contraceptive Jain\t\r et\t\r al.(\t\r 2004)", "Menstrual\t\r flow(\t\r absent) Jain\t\r et\t\r al.(\t\r 2004)",
"Postpartum\t\r uterus\t\r reduction Bhattarai(\t\r 1994)",
"Menstrual\t\r pain Pal\t\r and\t\r Jain(\t\r 1998),\t\r Lodha",
"Leucorrhea Yadav\t\r et\t\r al.(\t\r 2006)", "Contraceptive Bourdy\t\r and\t\r Walter(\t\r 1992)",
"Anemia Panyaphu\t\r et\t\r al.(\t\r 2011)", "Uterine\t\r contractions(\t\r induce) Panyaphu\t\r et\t\r al.(\t\r 2011)",
"Gynecological\t\r disorders\t\r NOS Liu\t\r et\t\r al.(\t\r 2009)"
), .Dim = c(34L, 4L)))
答案 0 :(得分:1)
如果您的列表名为list_df
,则可以select
的前4列:
library(dplyr)
all_data <- purrr::map_df(pdf.dat,~as.data.frame(.x) %>% select(1:4))
或在基数R中:
all_data <- do.call(rbind, lapply(pdf.dat, function(x) data.frame(x)[1:4]))