新的scrapy和表格数据提取

时间:2015-04-30 21:44:04

标签: python web-scraping scrapy

第一天上了斗争,我想在这个

中获取表格数据
http://www.cottagehealthsystem.org/tabid/149/Default.aspx

所以我想要

行政569-7290 麻醉569-7206 出生中心569-7232 癌症数据中心569-8280 心脏病监护室569-7222 心脏电生理学569-8234 心脏康复569-7201 心脏病学569-8284  等等

我做了这个

scrapy shell "http://www.cottagehealthsystem.org/tabid/149/Default.aspx"
response.selector.xpath('//table//td//text()').extract()

它给了我很多像这样的无用数据

[u'\n\t\t\t',
 u'\n                    Home\n                    ',
 u'\n                    \xa0|\xa0\n                    ',
 u'\n                    Patient & Visitor Info\n                    ',
 u'\n                    \xa0|\xa0\n                    ',
 u'\n                        Maps & Directions\n                    ',
 u'\n                    \xa0|\xa0\n                    ',
 u'\n                    Careers\n                    ',
 u'\n                    \xa0|\xa0\n                 ',
 u'\n                        Contact Us\n                    ',
 u'\n                    \xa0|\xa0\n                    ',
 u'                         Find a Physician  ',
 u'\n                ',
 u'\n                    \n                    ',
 u'\n                    ',
 u'\n                    ',
 u'Search',
 u'\n                    ',
 u'\r\n',
 u'\r\n.xsImgSearch { MARGIN-RIGHT: 2px }\r\n.xsCmdSearch { text-transform: uppercase; COLOR: #ffffff; BORDER: 0px; PADDING-RIGHT: 3px; PADDING-LEFT: 3px; FONT-SIZE: 8pt; MARGIN-RIGHT: 2px; BACKGROUND-COLOR: #006b6e }\r\n.xsTxtSearch { BORDER: #7f9db9 1px solid; FONT-SIZE: 9pt; WIDTH: 90px; MARGIN-RIGHT: 2px; BACKGROUND-COLOR: #ffffff }\r\n',
 u'\r\n',
 u'\r\n\t',
 u'\r\n\t\t',
 u'\r\n\t\t\t',
 u'\r\n\t\t\t',
 u'\r\n\t\t',
 u'\r\n\t',
 u'\r\n',
 u'\r\n',
 u'\n                    ',
 u'\n                    ',
 u'\n                ',
 u'\n                ',
 u'\n            ',
 u'\r\n\r\n',
 u'\r\n.isReferencedModuleBorder\r\n{\r\n    border: solid orange 1px;\r\n    border-bottom: solid #ff0000 0px;\r\n}\r\n.isReferencedModuleText\r\n{\r\n    color:orange;\r\n    border: solid orange 1px;\r\n    border-top: solid orange 0px;\r\n    text-align:center;\r\n}\r\n\r\n\r\n',
 u'\r\n\r\n',
 u'\r\n    \r\n    \r\n    ',
 u'\r\n\t',
 u'\r\n\t',
 u'\r\n',
 u'\r\n\r\n\r\n\r\n',
 u'\r\n',
 u'\r\n    \r\n    \r\n',
 u'\n                ',
 u'\n            ',
 u'\r\n\r\n',
 u'\r\n.isReferencedModuleBorder\r\n{\r\n    border: solid orange 1px;\r\n    border-bottom: solid #ff0000 0px;\r\n}\r\n.isReferencedModuleText\r\n{\r\n    color:orange;\r\n    border: solid orange 1px;\r\n    border-top: solid orange 0px;\r\n    text-align:center;\r\n}\r\n\r\n\r\n',
 u'\r\n\r\n',
 u'\r\n    \r\n    \r\n    ',
 u'\r\n\t',
 u'\r\n\t',
 u'\r\n',
 u'\r\n\r\n\r\n\r\n',
 u'\r\n',
 u'\r\n    \r\n    \r\n',
 u'\n                ',
 u'\n                    ',
 u'\n                    ',
 u'\n                        ',
 u'\n                            ',
 u'SBCH HEALTH SERVICES',
 u'\xa0',
 u'PATIENT & VISITOR INFO',
 u'\xa0',
 u'EVENTS & CLASSES',
 u'\xa0',
 u'DIRECTIONS & PARKING',
 u'\xa0',
 u'SBCH MAPS',
 u'\xa0',
 u'GIFT SHOP',
 u'\xa0',
 u'VISITOR DINING',
 u'\xa0',
 u'SANTA BARBARA COTTAGE HOSPITAL FOUNDATION',
 u'\xa0',
 u'CONTACT INFORMATION',
 u' \xa0\n                        ',
 u'\n                    ',
 u'\n                    ',
 u'\n                    ',
 u'\n                    ',
 u'\n                        ',
 u'\n                            ',
 u'\n                                ',
 u'\n                                    ',
 u'\n                                        ',
 u'\n                                            ',
 u'\n                                                Santa Barbara\n                                                ',
 u'\n                                                ',
 u'Cottage Hospital',
 u'\n                                            ',
 u'\n                                            ',
 u'\n                                            400 West Pueblo Street',
 u'\n                                            Santa Barbara, CA 93105',
 u'\n                                            805.682.7111\n                                        ',
 u'\n                                    ',
 u'\n                                ',
 u'\n                            ',
 u'\n                        ',
 u'\n                    ',
 u'\n                    ',
 u'\n                ',
 u'                \n            ',
 u'\n                ',
 u'\n                    ',
 u'\n                        ',
 u'\r\n\n                        ',
 u'\r\n\n                    ',
 u'\n\t\t    ',
 u'\n                        \tif(document.getElementById(\'dnn_topLeftContentPane\').innerHTML==""&&document.getElementById(\'dnn_topRightContentPane\').innerHTML=="")\n                        \t{\n                         \t   document.getElementById(\'tr_topLeftRight\').style.display="none";\n                       \t\t}\n                    ',
 u'\n                    ',
 u'\n                        ',
 u'\n                        ',
 u'\n                        ',
 u'\r\n\r\n',
 u'\r\n.isReferencedModuleBorder\r\n{\r\n    border: solid orange 1px;\r\n    border-bottom: solid #ff0000 0px;\r\n}\r\n.isReferencedModuleText\r\n{\r\n    color:orange;\r\n    border: solid orange 1px;\r\n    border-top: solid orange 0px;\r\n    text-align:center;\r\n}\r\n\r\n\r\n',
 u'\r\n\r\n',
 u'\r\n    \r\n    \r\n    ',
 u'\r\n\t',
 u'\r\n\t',
 u'\xa0',
 u'\r\n',
 u'\r\n',
 u'\r\n    ',
 u'\r\n        ',
 u'\r\n            ',
 u'\r\n            ',
 u'Postal Address ',
 u'\r\n            ',
 u'Cottage Health System',
 u'\r\n            ',
 u'Santa Barbara Cottage Hospital',
 u'\r\n            ',
 u'PO Box 689',
 u'\r\n            ',
 u'400 W. Pueblo Street',
 u'\r\n            ',
 u'Santa Barbara, CA 93102-0689',
 u'\r\n            ',
 u'\r\n            ',
 u'\xa0',
 u'\r\n            ',
 u'\r\n            ',
 u'\r\n                ',
 u'\r\n                    ',
 u'\r\n                        ',
 u'\u2022',
 u'\r\n                        ',
 u'Main Number',
 u':\xa0 (805) 682-7111',
 u'\r\n                    ',
 u'\r\n                    ',
 u'\r\n                        ',
 u'\u2022',
 u'\r\n                        ',
 u'General Information / Media Inquiries',
 u':\xa0 (805) 879-8980',
 u'\r\n                    ',
 u'\r\n                    ',
 u'\r\n                        ',
 u'\u2022',
 u'\r\n                        ',
 u'Community Concerns Line',
 u':\xa0 (805) 569-8917',
 u'\r\n                    ',
 u'\r\n                    ',
 u'\r\n                        ',
 u'\u2022',
 u'\r\n                        ',
 u'Public Affairs',
 u'\r\n                    ',
 u'\r\n                    ',
 u'\r\n                        ',
 u'\u2022',
 u'\r\n                        ',
 u'Patient Grievances, Complaints, Compliments: chsquality@sbch.org',
 u'\r\n                    ',
 u'\r\n                ',
 u'\r\n            ',
 u'\r\n            ',
 u'\r\n        ',
 u'\r\n    ',
 u'\r\n',
 u'\r\n\r\n',
 u'\xa0',
 u'\r\n',
 u'\r\n',
 u'\xa0',
 u'\r\n',
 u'Key Phone Numbers',
 u' (all 805 area code)',
 u'\r\n',
 u'\r\n',
 u'\r\n    ',
 u'\r\n        ',
 u'\r\n            ',
 u'Administration',
 u'\r\n            ',
 u'569-7290',
 u'\r\n        ',
 u'\r\n        ',
 u'\r\n            ',
 u'Anesthesia',
 u'\r\n            ',
 u'569-7206',
 u'\r\n        ',
 u'\r\n        ',
 u'\r\n            ',
 u'Birth Center',
 u'\r\n            ',
 u'569-7232',
 u'\r\n        ',
 u'\r\n        ',
 u'\r\n            ',
 u'Cancer Data Center',
 u'\r\n            ',
 u'569-8280',
 u'\r\n        ',
 u'\r\n        ',
 u'\r\n            ',
 u'Cardiac Care Unit',
 u'\r\n            ',
 u'569-7222',
 u'\r\n        ',
 u'\r\n        ',
 u'\r\n            ',
 u'Cardiac Electrophysiology',
 u'\r\n            ',
 u'569-8234',
 u'\r\n        ',
 u'\r\n        ',
 u'\r\n            ',
 u'Cardiac Rehabilitation',
 u'\r\n            ',
 u'569-7201',
 u'\r\n        ',
 u'\r\n        ',
 u'\r\n            ',
 u'Cardiology',
 u'\r\n            ',
 u'569-8284',
 u'\r\n        ',
 u'\r\n        ',
 u'\r\n            ',
 u'Careers / Recruitment',
 u'\r\n            ',
 u'879-8799',
 u'\r\n        ',
 u'\r\n        ',
 u'\r\n            ',
 u'Chemical Dependency',
 u'\r\n            ',
 u'569-7253',
 u'\r\n        ',
 u'\r\n        ',
 u'\r\n            ',
 u"Children's Hospital",
 u'\r\n            ',
 u'569-7273',
 u'\r\n        ',
 u'\r\n        ',
 u'\r\n            ',
 u'Community Concerns Line',
 u'\r\n            ',
 u'569-8917',
 u'\r\n        ',
 u'\r\n        ',
 u'\r\n            ',
 u'Community Wellness',
 u'\r\n            ',
 u'569-8240',
 u'\r\n        ',
 u'\r\n        ',
 u'\r\n            ',
 u'Construction Project Management Hotline',
 u'\r\n            ',
 u'569-8915',
 u'\r\n        ',
 u'\r\n        ',
 u'\r\n            ',
 u'Diabetes Services',
 u'\r\n            ',
 u'569-8240',
 u'\r\n        ',
 u'\r\n        ',
 u'\r\n            ',
 u'Development',
 u'\r\n            ',
 u'879-8980',
 u'\r\n        ',
 u'\r\n        ',
 u'\r\n            ',
 u'Education',
 u'\r\n            ',
 u'569-7325',
 u'\r\n        ',
 u'\r\n        ',
 u'\r\n            ',
 u'Emergency Department',
 u'\r\n            ',
 u'569-7210',
 u'\r\n        ',
 u'\r\n        ',
 u'\r\n            ',
 u'Emergency Psychiatric Services',
 u'\r\n            ',
 u'569-8339',
 u'\r\n        ',
 u'\r\n        ',
 u'\r\n            ',
 u'Endoscopy Services',
 u'\r\n            ',
 u'569-7285',
 u'\r\n        ',
 u'\r\n        ',
 u'\r\n            ',
 u'Enterostomal Therapy',
 u'\r\n            ',
 u'569-7385',
 u'\r\n        ',
 u'\r\n        ',
 u'\r\n            ',
 u'Environmental Services',
 u'\r\n            ',
 u'569-7205',
 u'\r\n        ',
 u'\r\n        ',
 u'\r\n            ',
 u'Eye & Laser Center',
 u'\r\n            ',
 u'569-8260',
 u'\r\n        ',
 u'\r\n        ',
 u'\r\n            ',
 u'Facilities Management',
 u'\r\n            ',
 u'569-7318',
 u'\r\n        ',
 u'\r\n        ',
 u'\r\n            ',
 u'Finance',
 u'\r\n            ',
 u'879-8935',
 u'\r\n        ',
 u'\r\n        ',
 u'\r\n            ',
 u'Foundation',
 u'\r\n            ',
 u'879-8980',
 u'\r\n        ',
 u'\r\n        ',
 u'\r\n            ',
 u'Gift Shop',
 u'\r\n            ',
 u'682-7454',
 u'\r\n        ',
 u'\r\n        ',
 u'\r\n            ',
 u'Infection Control',
 u'\r\n            ',
 u'569-7204',
 u'\r\n        ',
 u'\r\n        ',
 u'\r\n            ',
 u'Information Systems',
 u'\r\n            ',
 u'569-7270',
 u'\r\n        ',
 u'\r\n        ',
 u'\r\n            ',
 u'Interpreter (Translation) Services',
 u'\r\n            ',
 u'569-8200',
 u'\r\n        ',
 u'\r\n        ',
 u'\r\n            ',
 u'Laboratory',
 u'\r\n            ',
 u'569-7360',
 u'\r\n        ',
 u'\r\n        ',
 u'\r\n            ',
 u'Lifeline',
 u'\r\n            ',
 u'569-7572',
 u'\r\n        ',
 u'\r\n        ',
 u'\r\n            ',
 u'Library (Reeves Medical Library)',
 u'\r\n            ',
 u'569-7240',
 u'\r\n        ',
 u'\r\n        ',
 u'\r\n            ',
 u'Materials Management',
 u'\r\n            ',
 u'569-7209',
 u'\r\n        ',
 u'\r\n        ',
 u'\r\n            ',
 u'Medical Education',
 u'\r\n            ',
 u'569-7315',
 u'\r\n        ',
 u'\r\n        ',
 u'\r\n            ',
 u'Medical Records',
 u'\r\n            ',
 u'569-7306',
 u'\r\n        ',
 u'\r\n        ',
 u'\r\n            ',
 u'Medical Social Services',
 u'\r\n            ',
 u'569-8273',
 u'\r\n        ',
 u'\r\n        ',
 u'\r\n            ',
 u'Neonatology',
 u'\r\n            ',
 u'569-7522',
 u'\r\n        ',
 u'\r\n        ',
 u'\r\n            ',
 u'Nutrition',
 u'\r\n            ',
 u'569-7216',
 u'\r\n        ',
 u'\r\n        ',
 u'\r\n            ',
 u'Occupational Therapy',
 u'\r\n            ',
 u'569-7201',
 u'\r\n        ',
 u'\r\n        ',
 u'\r\n            ',
 u'Outpatient Surgery',
 u'\r\n            ',
 u'569-7287',
 u'\r\n        ',
 u'\r\n        ',
 u'\r\n            ',
 u'Pathology',
 u'\r\n            ',
 u'569-7367',
 u'\r\n        ',
 u'\r\n        ',
 u'\r\n            ',
 u'Patient Business Services',
 u'\r\n            ',
 u'879-8900',
 u'\r\n        ',
 u'\r\n        ',
 u'\r\n            ',
 u'Patient Grievances',
 u'\r\n            ',
 u'879-8769',
 u'\r\n        ',
 u'\r\n        ',
 u'\r\n            ',
 u'Pediatric Hematology/Oncology',
 u'\r\n            ',
 u'569-8394',
 u'\r\n        ',
 u'\r\n        ',
 u'\r\n            ',
 u'Pediatric Intensive Care Unit',
 u'\r\n            ',
 u'569-8297',
 u'\r\n        ',
 u'\r\n        ',
 u'\r\n            ',
 u'Pediatric Services',
 u'\r\n            ',
 u'569-7273',
 u'\r\n        ',
 u'\r\n        ',
 u'\r\n            ',
 u'Perinatal Center',
 u'\r\n            ',
 u'569-8209',
 u'\r\n        ',
 u'\r\n        ',
 u'\r\n            ',
 u'Pharmacy',
 u'\r\n            ',
 u'569-8381',
 u'\r\n        ',
 u'\r\n        ',
 u'\r\n            ',
 u'Psychiatric Services',
 u'\r\n            ',
 u'569-7253',
 u'\r\n        ',
 u'\r\n        ',
 u'\r\n            ',
 u'Public Affairs',
 u'\r\n            ',
 u'879-8980',
 u'\r\n        ',
 u'\r\n        ',
 u'\r\n            ',
 u'Quality Improvement',
 u'\r\n            ',
 u'569-7244',
 u'\r\n        ',
 u'\r\n        ',
 u'\r\n            ',
 u'Radiology',
 u'\r\n            ',
 u'569-7279',
 u'\r\n        ',
 u'\r\n        ',
 u'\r\n            ',
 u'Research',
 u'\r\n            ',
 u'569-8976',
 u'\r\n        ',
 u'\r\n        ',
 u'\r\n            ',
 u'Respiratory Care',
 u'\r\n            ',
 u'569-7207',
 u'\r\n        ',
 u'\r\n        ',
 u'\r\n            ',
 u'Reeves Medical Library',
 u'\r\n            ',
 u'569-7240',
 u'\r\n        ',
 u'\r\n        ',
 u'\r\n            ',
 u'Speech Therapy',
 u'\r\n            ',
 u'569-7201',
 u'\r\n        ',
 u'\r\n        ',
 u'\r\n            ',
 u'Spiritual Care',
 u'\r\n            ',
 u'569-8386',
 u'\r\n        ',
 u'\r\n        ',
 u'\r\n            ',
 u'Surgical Education',
 u'\r\n            ',
 u'569-7316',
 u'\r\n        ',
 u'\r\n        ',
 u'\r\n            ',
 u'Surgical Services',
 u'\r\n            ',
 u'569-7255',
 u'\r\n        ',
 u'\r\n        ',
 u'\r\n            ',
 u'Therapy Services',
 u'\r\n            ',
 u'569-7201',
 u'\r\n        ',
 u'\r\n        ',
 u'\r\n            ',
 u'Utilization Management',
 u'\r\n            ',
 u'569-7292',
 u'\r\n        ',
 u'\r\n        ',
 u'\r\n            ',
 u'Volunteer Services',
 u'\r\n            ',
 u'569-7357',
 u'\r\n        ',
 u'\r\n    ',
 u'\r\n',
 u'\r\n\r\n',
 u'\xa0110413',
 u'\r\n',
 u'\r\n\r\n\r\n\r\n',
 u'\r\n',
 u'\r\n    \r\n    \r\n',
 u'\r\n\n                        ',
 u'\n                        ',
 u'\n                    ',
 u'\n\t\t\t',
 u'\n\t\t\t\tif(document.getElementById(\'dnn_ContentPane\').innerHTML=="")\n\t\t\t\t{\n\t\t\t\t\tdocument.getElementById(\'tr_contentPane\').style.display="none";\n\t\t\t\t}\n\t\t\t',
 u'\n                    ',
 u'\n                        ',
 u'\n                        ',
 u'\n                        ',
 u'\r\n\n                        ',
 u'\n                        ',
 u'\n                    ',
 u'\n\t\t\t',
 u'\n\t\t\t\tif(document.getElementById(\'dnn_bottomLineContentPane\').innerHTML=="")\n\t\t\t\t{\n\t\t\t\t\tdocument.getElementById(\'tr_bottomLine\').style.display="none";\n\t\t\t\t}\n\t\t\t',
 u'\n                    ',
 u'\n                        ',
 u'\n                        ',
 u'\n                        ',
 u'\r\n\n                        ',
 u'\n                        ',
 u'\n                    ',
 u'\n\t\t\t',
 u'\n\t\t\t\tif(document.getElementById(\'dnn_bottomOneColumnContentPane\').innerHTML=="")\n\t\t\t\t{\n\t\t\t\t\tdocument.getElementById(\'tr_bottomOneCol\').style.display="none";\n\t\t\t\t}\n\t\t\t',
 u'\n                    ',
 u'\n                        ',
 u'\n                        ',
 u'\n                        ',
 u'\r\n\n                        ',
 u'\r\n\n                        ',
 u'\n                        ',
 u'\n                    ',
 u'\n\t\t\t',
 u'\n                        \tif(document.getElementById(\'dnn_bottomRightNarrowContentPane\').innerHTML==""&&document.getElementById(\'dnn_bottomLeftWideContentPane\').innerHTML=="")\n                        \t{\n                         \t   document.getElementById(\'topBottomThing\').style.display="none";\n                       \t\t}\n                    \t',
 u'\n                    ',
 u'\n                        ',
 u'\n                            ',
 u'\n                                ',
 u'\n                                    ',
 u'\n                                    ',
 u'\n                                    ',
 u'\r\n\n                                    ',
 u'\r\n\n                                    ',
 u'\n                                    ',
 u'\n                                ',
 u'\n                            ',
 u'\n                        ',
 u'\n                    ',
 u'\n\t\t\t\t\t',
 u'\n                        \t\t\tif(document.getElementById(\'dnn_bottomRightWideContentPane\').innerHTML==""&&document.getElementById(\'dnn_bottomLeftNarrowContentPane\').innerHTML=="")\n                        \t\t\t{\n                         \t   \t\t\tdocument.getElementById(\'bottomBottomThing\').style.display="none";\n                       \t\t\t\t}\n                    \t\t\t',
 u'\n\t\t\t',
 u'\n\t\t    ',
 u'\n                        \tif(document.getElementById(\'dnn_bottomRightNarrowContentPane\').innerHTML!=""||document.getElementById(\'dnn_bottomLeftNarrowContentPane\').innerHTML!="")\n                        \t{\n\t\t\t\t   document.getElementById(\'tr_pushup\').style.display="none";\n                       \t\t}\n                    ',
 u'\n                ',
 u'\n            ']

请帮助

ok shell我试过这个

response.selector.xpath(' // //表TD&#39)

这样可以提供更好的格式,但仍有很多噪音

 <Selector xpath='//table//td//td' data=u'<td>Public Affairs</td>'>,
 <Selector xpath='//table//td//td' data=u'<td>879-8980</td>'>,
 <Selector xpath='//table//td//td' data=u'<td>Quality Improvement</td>'>,
 <Selector xpath='//table//td//td' data=u'<td>569-7244</td>'>,
 <Selector xpath='//table//td//td' data=u'<td>Radiology</td>'>,
 <Selector xpath='//table//td//td' data=u'<td>569-7279</td>'>,
 <Selector xpath='//table//td//td' data=u'<td>Research</td>'>,
 <Selector xpath='//table//td//td' data=u'<td>569-8976</td>'>,
 <Selector xpath='//table//td//td' data=u'<td>Respiratory Care</td>'>,
 <Selector xpath='//table//td//td' data=u'<td>569-7207</td>'>,
 <Selector xpath='//table//td//td' data=u'<td>Reeves Medical Library</td>'>,
 <Selector xpath='//table//td//td' data=u'<td>569-7240</td>'>,
 <Selector xpath='//table//td//td' data=u'<td>Speech Therapy</td>'>,
 <Selector xpath='//table//td//td' data=u'<td>569-7201</td>'>,
 <Selector xpath='//table//td//td' data=u'<td>Spiritual Care</td>'>,
 <Selector xpath='//table//td//td' data=u'<td>569-8386</td>'>,
 <Selector xpath='//table//td//td' data=u'<td>Surgical Education</td>'>,
 <Selector xpath='//table//td//td' data=u'<td>569-7316</td>'>,
 <Selector xpath='//table//td//td' data=u'<td>Surgical Services</td>'>,
 <Selector xpath='//table//td//td' data=u'<td>569-7255</td>'>,
 <Selector xpath='//table//td//td' data=u'<td>Therapy Services</td>'>,
 <Selector xpath='//table//td//td' data=u'<td>569-7201</td>'>,
 <Selector xpath='//table//td//td' data=u'<td>Utilization Management</td>'>,
 <Selector xpath='//table//td//td' data=u'<td>569-7292</td>'>,
 <Selector xpath='//table//td//td' data=u'<td>Volunteer Services</td>'>,
 <Selector xpath='//table//td//td' data=u'<td>569-7357</td>'>,
 <Selector xpath='//table//td//td' data=u'<td style="height:0px;" class="rightColS'>,
 <Selector xpath='//table//td//td' data=u'<td style="height:0px;" class="leftColSp'>,
 <Selector xpath='//table//td//td' data=u'<td id="dnn_bottomLineContentPane" valig'>,
 <Selector xpath='//table//td//td' data=u'<td style="height:0px;" class="rightColS'>,
 <Selector xpath='//table//td//td' data=u'<td style="height:0px;" class="leftColSp'>,
 <Selector xpath='//table//td//td' data=u'<td id="dnn_bottomOneColumnContentPane" '>,
 <Selector xpath='//table//td//td' data=u'<td style="height:0px;" class="rightColS'>,
 <Selector xpath='//table//td//td' data=u'<td class="leftColSpacer">\n             '>,
 <Selector xpath='//table//td//td' data=u'<td id="dnn_bottomLeftWideContentPane" c'>,
 <Selector xpath='//table//td//td' data=u'<td id="dnn_bottomRightNarrowContentPane'>,
 <Selector xpath='//table//td//td' data=u'<td class="rightColSpacer">\n            '>,
 <Selector xpath='//table//td//td' data=u'<td colspan="6">\n                       '>,
 <Selector xpath='//table//td//td' data=u'<td class="leftColSpacer" valign="top">\n'>,
 <Selector xpath='//table//td//td' data=u'<td id="dnn_bottomLeftNarrowContentPane"'>,
 <Selector xpath='//table//td//td' data=u'<td id="dnn_bottomRightWideContentPane" '>,
 <Selector xpath='//table//td//td' data=u'<td class="rightColSpacer" v

1 个答案:

答案 0 :(得分:1)

您必须修改您的xpath,以便首先为该表中的所有tr创建一个选择器对象,然后循环通过选择器对象列表并相对访问每个td tr 1}}。以下代码将帮助您完成此操作。

In [1]: trs = response.xpath('//*[@id="dnn_ctr724_HtmlModule_HtmlModule_lblContent"]/table[2]/tbody/tr')

In [2]: for tr in trs:                                                                                  
    name = tr.xpath('./td[1]//text()').extract()
    value = tr.xpath('./td[2]//text()').extract()
        name = name[0].strip() if name else 'n/a'
        value = value[0].strip() if value else 'n/a'
        print '%s  >>  %s ' % (name, value)     

输出将是,

Administration  >>  569-7290 
Anesthesia  >>  569-7206 
Birth Center  >>  569-7232 
Cancer Data Center  >>  569-8280 
Cardiac Care Unit  >>  569-7222 
Cardiac Electrophysiology  >>  569-8234 
Cardiac Rehabilitation  >>  569-7201 
Cardiology  >>  569-8284 
Careers / Recruitment  >>  879-8799 
Chemical Dependency  >>  569-7253 
Children\'s Hospital  >>  569-7273 
Community Concerns Line  >>  569-8917 
Community Wellness  >>  569-8240 
Construction Project Management Hotline  >>  569-8915 
Diabetes Services  >>  569-8240 
Development  >>  879-8980 
Education  >>  569-7325 
Emergency Department  >>  569-7210 
Emergency Psychiatric Services  >>  569-8339 
Endoscopy Services  >>  569-7285 
Enterostomal Therapy  >>  569-7385 
Environmental Services  >>  569-7205 
Eye & Laser Center  >>  569-8260 
Facilities Management  >>  569-7318 
Finance  >>  879-8935 
Foundation  >>  879-8980 
Gift Shop  >>  682-7454 
Infection Control  >>  569-7204 
Information Systems  >>  569-7270 
Interpreter (Translation) Services  >>  569-8200 
Laboratory  >>  569-7360 
Lifeline  >>  569-7572 
Library (Reeves Medical Library)  >>  569-7240 
Materials Management  >>  569-7209 
Medical Education  >>  569-7315 
Medical Records  >>  569-7306 
Medical Social Services  >>  569-8273 
Neonatology  >>  569-7522 
Nutrition  >>  569-7216 
Occupational Therapy  >>  569-7201 
Outpatient Surgery  >>  569-7287 
Pathology  >>  569-7367 
Patient Business Services  >>  879-8900 
Patient Grievances  >>  879-8769 
Pediatric Hematology/Oncology  >>  569-8394 
Pediatric Intensive Care Unit  >>  569-8297 
Pediatric Services  >>  569-7273 
Perinatal Center  >>  569-8209 
Pharmacy  >>  569-8381 
Psychiatric Services  >>  569-7253 
Public Affairs  >>  879-8980 
Quality Improvement  >>  569-7244 
Radiology  >>  569-7279 
Research  >>  569-8976 
Respiratory Care  >>  569-7207 
Reeves Medical Library  >>  569-7240 
Speech Therapy  >>  569-7201 
Spiritual Care  >>  569-8386 
Surgical Education  >>  569-7316 
Surgical Services  >>  569-7255 
Therapy Services  >>  569-7201 
Utilization Management  >>  569-7292 
Volunteer Services  >>  569-7357