如何将非结构化字符串转换为数据框?

时间:2019-10-14 15:44:59

标签: python string pandas dataframe data-manipulation

我有一个长字符串文本,我想将其转换为要分析的数据框。请参阅下面的以下数据示例。我希望这些列为“设施”,“街道”,“城市”,“电话”和“营业时间”。

string = AlaskaUSCG Base Ketchikan 1300 Stedman Street  Ketchikan, AK  (907) 228-0250 Mon-Fri 7:30am-5pm | Sat 10am-4pm | Closed Sunday USCG Base Kodiak Albatros Avenue, Building 26 (2nd Floor)  Kodiak, AK  (907) 487-5773 USCG Base Kodiak Albatros Avenue, Building 26 (1st Floor)  Kodiak, AK  (907) 487-5773 Mon-Fri: 7am-9pm | Sat: 9am-9pm |

我已经使用StringIO将其转换为数据帧,但是将其转换为具有0行和1000列的数据帧。相反,我想要上面提到的列和每个商店的行。

我希望它看起来像这样,数据填充为行:

Facility                    Street               City           Phone   
Alaska USCG Base Ketchikan  1300 Stedman Street  Ketchikan, AK  (907) 228 0250

1 个答案:

答案 0 :(得分:1)

您可以使用简单的网络抓取技术,例如bs4requests

import bs4 

r = requests.get(URL)
b = bs4.BeautifulSoup(r.text)

addresses = []

for val in b.find_all(name='p'):
  s = list(val.stripped_strings)
  if s and not s[0].startswith('HOURS'): addresses.append(' '.join(s[:-1]))