如何从熊猫字符串中提取前8个字符

时间:2018-07-31 07:09:13

标签: python-3.x pandas

我在数据帧中有列,我正尝试从字符串中提取8位数字。我该怎么办

    Input
 Shipment ID
20180504-S-20000
20180514-S-20537
20180514-S-20541
20180514-S-20644
20180514-S-20644
20180516-S-20009
20180516-S-20009
20180516-S-20009
20180516-S-20009

预期产量

Order_Date
20180504
20180514
20180514
20180514
20180514
20180516
20180516
20180516
20180516

我尝试了下面的代码,但没有用。

data['Order_Date'] = data['Shipment ID'][:8]

3 个答案:

答案 0 :(得分:0)

您接近了,需要使用str进行索引,该索引适用于Serie s的每个值:

data['Order_Date'] = data['Shipment ID'].str[:8]

如果没有NaN的值,则为获得更好的性能:

data['Order_Date'] = [x[:8] for x in data['Shipment ID']]

print (data)
        Shipment ID Order_Date
0  20180504-S-20000   20180504
1  20180514-S-20537   20180514
2  20180514-S-20541   20180514
3  20180514-S-20644   20180514
4  20180514-S-20644   20180514
5  20180516-S-20009   20180516
6  20180516-S-20009   20180516
7  20180516-S-20009   20180516
8  20180516-S-20009   20180516

如果按位置省略str代码过滤器列,则前N个值如下:

print (data['Shipment ID'][:2])
0    20180504-S-20000
1    20180514-S-20537
Name: Shipment ID, dtype: object

答案 1 :(得分:0)

您也可以使用str.extract

例如:

import pandas as pd

df = pd.DataFrame({'Shipment ID': ['20180504-S-20000', '20180514-S-20537', '20180514-S-20541', '20180514-S-20644', '20180514-S-20644', '20180516-S-20009', '20180516-S-20009', '20180516-S-20009', '20180516-S-20009']})
df["Order_Date"] = df["Shipment ID"].str.extract(r"(\d{8})")
print(df)

输出:

       Shipment ID Order_Date
0  20180504-S-20000   20180504
1  20180514-S-20537   20180514
2  20180514-S-20541   20180514
3  20180514-S-20644   20180514
4  20180514-S-20644   20180514
5  20180516-S-20009   20180516
6  20180516-S-20009   20180516
7  20180516-S-20009   20180516
8  20180516-S-20009   20180516

答案 2 :(得分:0)

您还可以决定从search([('a_ids','in', [a_id])])删除到结尾

-S

您还可以捕获前8位数字,然后删除所有内容,并用捕获的组的后向引用代替:

df["Order_Date"]=df['Shipment ID'].replace(regex=r"\-.*",value="")
df
        Shipment ID Order_Date
0  20180504-S-20000   20180504
1  20180514-S-20537   20180514
2  20180514-S-20541   20180514
3  20180514-S-20644   20180514
4  20180514-S-20644   20180514
5  20180516-S-20009   20180516
6  20180516-S-20009   20180516
7  20180516-S-20009   20180516
8  20180516-S-20009   20180516