python操作列删除子串

时间:2015-08-06 15:24:29

标签: python csv numpy pandas

我有一个csv,我正在阅读数据框。然后我使用一个系列来修改csv的特定列。此列包含日期和时间。我基本上想从列中删除时间。该列看起来像这样

0      7/28/2015 14:31
1       7/28/2015 8:13
2      7/28/2015 16:16
3      7/28/2015 16:18
4       7/27/2015 9:54
5       7/27/2015 9:52

我拆分列

s = df['Work Info Date'].str.split(' ')
0      [7/28/2015, 14:31]
1       [7/28/2015, 8:13]
2      [7/28/2015, 16:16]
3      [7/28/2015, 16:18]
4       [7/27/2015, 9:54]
5       [7/27/2015, 9:52]

当我尝试使用del来del时间元素时,它只删除索引

del s[1]
0      [7/28/2015, 14:31]
2      [7/28/2015, 16:16]
3      [7/28/2015, 16:18]
4       [7/27/2015, 9:54]
5       [7/27/2015, 9:52]

我的最终目标是从此列中删除时间将其加入电子表格。

0      7/28/2015
1       7/28/2015
2      7/28/2015
3      7/28/2015
4       7/27/2015
5       7/27/2015 

电子表格

Incident ID,Submitter,Time Spent,Work Info Date
INC000004294045,Bob,,7/28/2015 14:31
INC000004301664,Janice,,7/28/2015 8:13
INC000004301813,Robert,,7/28/2015 16:16
INC000004301813,Alex,,7/28/2015 16:18

代码:

import pandas as pd
import numpy as np




df = pd.read_csv('output2.csv', encoding = 'utf-8')
s = df['Work Info Date'].str.split(' ')

s.name = 'Work Info Date'
del s[1]
s



#del df['Work Info Date']
#df.join(s)
#time_report = pd.pivot_table(df, index=["Submitter", "Work Info Date"], values=["Time Spent"], aggfunc = [np.sum], fill_value=0

2 个答案:

答案 0 :(得分:1)

您可以再次使用.str获取矢量化访问权限以选择列:

>>> df["Work Info Date"].str.split()
0    [7/28/2015, 14:31]
1     [7/28/2015, 8:13]
2    [7/28/2015, 16:16]
3    [7/28/2015, 16:18]
dtype: object
>>> df["Work Info Date"].str.split().str[0]
0    7/28/2015
1    7/28/2015
2    7/28/2015
3    7/28/2015
dtype: object
>>> df["Just_the_Date"] = df["Work Info Date"].str.split().str[0]
>>> df
       Incident ID Submitter  Time Spent   Work Info Date Just_the_Date
0  INC000004294045       Bob         NaN  7/28/2015 14:31     7/28/2015
1  INC000004301664    Janice         NaN   7/28/2015 8:13     7/28/2015
2  INC000004301813    Robert         NaN  7/28/2015 16:16     7/28/2015
3  INC000004301813      Alex         NaN  7/28/2015 16:18     7/28/2015

您可能希望将日期转换为日期列而不仅仅是字符串,但这取决于您。

答案 1 :(得分:0)

您可以使用Series.applydatetime.strptime()datetime.strftime()首先将日期时间解析为日期时间对象,然后将其转换为所需格式的字符串。代码 -

df['Work Info Date'] = df['Work Info Date'].apply(lambda x: datetime.datetime.strptime(x,'%m/%d/%Y %H:%M').strftime('%m/%d/%Y'))

这样做的好处是可以将日期转换为您想要的任何格式。

示例/演示 -

In [3]: df = pd.read_csv('a.csv', encoding = 'utf-8')

In [4]: df
Out[4]:
       Incident ID Submitter  Time Spent   Work Info Date
0  INC000004294045       Bob         NaN  7/28/2015 14:31
1  INC000004301664    Janice         NaN   7/28/2015 8:13
2  INC000004301813    Robert         NaN  7/28/2015 16:16
3  INC000004301813      Alex         NaN  7/28/2015 16:18

In [6]: import datetime

In [7]: df['Work Info Date'] = df['Work Info Date'].apply(lambda x: datetime.datetime.strptime(x,'%m/%d/%Y %H:%M').strftime('%m/%d/%Y'))

In [8]: df
Out[8]:
       Incident ID Submitter  Time Spent Work Info Date
0  INC000004294045       Bob         NaN     07/28/2015
1  INC000004301664    Janice         NaN     07/28/2015
2  INC000004301813    Robert         NaN     07/28/2015
3  INC000004301813      Alex         NaN     07/28/2015