I have a dataframe containing the query part of multiple urls.
For eg.
in=2015-09-19&stars_4=yes&min=4&a=3&city=New+York,+NY,+United+States&out=2015-09-20&search=1\n
in=2015-09-14&stars_3=yes&min=4&a=3&city=London,+United+Kingdom&out=2015-09-15&search=1\n
in=2015-09-26&Filter=175&min=5&a=2&city=New+York,+NY,+United+States&out=2015-09-27&search=2\n
My desired dataframe should be:
in Filter stars min a max city country out search
--------------------------------------------------------------------------------
2015-09-19 NAN stars_4 4 3 NAN NY US 2015-09-20 1
2015-09-14 NAN stars_3 4 3 NAN LONDON UK 2015-09-15 1
2015-09-26 175 NAN 5 2 NAN NY US 2015-09-27 2
Is there any easy way out for this using regex?
Any help will be much appreciated! Thanks in advance!
答案 0 :(得分:1)
快速而肮脏的解决方法是使用列表推导:
json_data = [{c[0]:c[1] for c in [b.split('=') for b in line.split('&')]} \
for line in open('data_file.txt')]
df = pd.DataFrame.from_records(json_data)
这不会解决您的位置分类问题,但会为您提供更好的数据框架。