我有一个长字符串文本,我想将其转换为要分析的数据框。请参阅下面的以下数据示例。我希望这些列为“设施”,“街道”,“城市”,“电话”和“营业时间”。
string = AlaskaUSCG Base Ketchikan 1300 Stedman Street Ketchikan, AK (907) 228-0250 Mon-Fri 7:30am-5pm | Sat 10am-4pm | Closed Sunday USCG Base Kodiak Albatros Avenue, Building 26 (2nd Floor) Kodiak, AK (907) 487-5773 USCG Base Kodiak Albatros Avenue, Building 26 (1st Floor) Kodiak, AK (907) 487-5773 Mon-Fri: 7am-9pm | Sat: 9am-9pm |
我已经使用StringIO将其转换为数据帧,但是将其转换为具有0行和1000列的数据帧。相反,我想要上面提到的列和每个商店的行。
我希望它看起来像这样,数据填充为行:
Facility Street City Phone
Alaska USCG Base Ketchikan 1300 Stedman Street Ketchikan, AK (907) 228 0250
答案 0 :(得分:1)
您可以使用简单的网络抓取技术,例如bs4
和requests
。
import bs4
r = requests.get(URL)
b = bs4.BeautifulSoup(r.text)
addresses = []
for val in b.find_all(name='p'):
s = list(val.stripped_strings)
if s and not s[0].startswith('HOURS'): addresses.append(' '.join(s[:-1]))