Question

I have the following data:

Example:

DRIVER_ID;TIMESTAMP;POSITION

156;2014-02-01 00:00:00.739166+01;POINT(41.8836718276551 12.4877775603346)

I want to create a pandas dataframe with 4 columns that are the id, time, longitude, latitude. So far, I got:

cur_cab = pd.DataFrame.from_csv(
            path,
            sep=";",
            header=None,
            parse_dates=[1]).reset_index()
cur_cab.columns = ['cab_id', 'datetime', 'point']

path specifies the .txt file containing the data. I already wrote a function that returns the longitude and latitude values from the point formated string. How do I expand the data frame with the additional column and the splitted values ?

Answer 1

After loading, if you're using a recent version of pandas then you can use the vectorised str methods to parse the column:

In [87]:
df['pos_x'], df['pos_y']= df['point'].str[6:-1].str.split(expand=True)
df

Out[87]:
   cab_id                   datetime  \
0     156 2014-01-31 23:00:00.739166   

                                      point  pos_x  pos_y  
0  POINT(41.8836718276551 12.4877775603346)      0      1

Also you should stop using from_csv it's no longer updated, use the top level read_csv so your loading code would be:

cur_cab = pd.read_csv(
            path,
            sep=";",
            header=None,
            parse_dates=[1],
            names=['cab_id', 'datetime', 'point'],
            skiprows=1)

How can I add columns in a data frame?

1 个答案: