I have the following data:
Example:
DRIVER_ID;TIMESTAMP;POSITION
156;2014-02-01 00:00:00.739166+01;POINT(41.8836718276551 12.4877775603346)
I want to create a pandas dataframe with 4 columns that are the id, time, longitude, latitude. So far, I got:
cur_cab = pd.DataFrame.from_csv(
path,
sep=";",
header=None,
parse_dates=[1]).reset_index()
cur_cab.columns = ['cab_id', 'datetime', 'point']
path
specifies the .txt file containing the data.
I already wrote a function that returns the longitude and latitude values from the point formated string.
How do I expand the data frame with the additional column and the splitted values ?
答案 0 :(得分:2)
After loading, if you're using a recent version of pandas then you can use the vectorised str
methods to parse the column:
In [87]:
df['pos_x'], df['pos_y']= df['point'].str[6:-1].str.split(expand=True)
df
Out[87]:
cab_id datetime \
0 156 2014-01-31 23:00:00.739166
point pos_x pos_y
0 POINT(41.8836718276551 12.4877775603346) 0 1
Also you should stop using from_csv
it's no longer updated, use the top level read_csv
so your loading code would be:
cur_cab = pd.read_csv(
path,
sep=";",
header=None,
parse_dates=[1],
names=['cab_id', 'datetime', 'point'],
skiprows=1)