根据分隔符字符串

时间:2018-02-20 03:10:31

标签: python pandas dataframe

例如,我们有一个带

的csv文件

name age address john 25 koramangala banglore #@ sales maneger %$ india harshuth rao 36 belandur banglore #@ maneger %$ india vijay kumar 45 ulsoor banglore #@ sales maneger %$ india suhas 25 koramangala banglore #@analist %$ india mithun 22 venkatapura banglore #@ execitive %$ india

如何制作并添加到不同的列

name           age  city                  country     position 
john           25   koramangala banglore  india       sales maneger
harshuth rao   36   belandur banglore     india       maneger
vijay kumar    45   ulsoor banglore       india       sales maneger
suhas          25   koramangala banglore  india       analist
mithun         22   venkatapura banglore  india       execitive

我使用的代码是

 import re
 import csv
 with open("/home/vipul/Desktop/example.csv", 'rb') as f:
    mycsv = csv.reader(f)
    for row in mycsv:
        text = row[0]
        txt = re.findall(r'(\w+[\s\w]*)\b', text)  
        print txt

这就是它在txt编辑器中的外观

name ,age ,address
john,25,koramangala banglore +ACMAQA- sales maneger +ACUAJA- india
harshuth rao ,36,belandur banglore +ACMAQA-  maneger +ACUAJA- india 
vijay kumar,45,ulsoor banglore +ACMAQA- sales maneger +ACUAJA- india
suhas,25,koramangala banglore +ACMAQA-analist +ACUAJA- india
mithun,22,venkatapura banglore +ACMAQA- execitive +ACUAJA- india

2 个答案:

答案 0 :(得分:1)

首先,使用pd.read_csv加载您的数据:

import pandas as pd

df = pd.read_csv("/home/vipul/Desktop/example.csv", sep=',')

print(df)
           name   age                                             address
0           john    25  koramangala banglore +ACMAQA- sales maneger +A...
1  harshuth rao     36  belandur banglore +ACMAQA-  maneger +ACUAJA- i...
2    vijay kumar    45  ulsoor banglore +ACMAQA- sales maneger +ACUAJA...
3          suhas    25  koramangala banglore +ACMAQA-analist +ACUAJA- ...
4         mithun    22  venkatapura banglore +ACMAQA- execitive +ACUAJ...

接下来,使用str.split分隔数据+ pd.concat以加入原始数据:

v = df.pop('address').str.split('\s*\+.*?-\s*', expand=True)
v.columns = ['city', 'position', 'country']

df = pd.concat([df, v], 1)

print(df)
           name   age                   city       position country
0           john    25  koramangala banglore  sales maneger   india
1  harshuth rao     36     belandur banglore        maneger  india 
2    vijay kumar    45       ulsoor banglore  sales maneger   india
3          suhas    25  koramangala banglore        analist   india
4         mithun    22  venkatapura banglore      execitive   india

最后,保存为CSV:

df.to_csv("/home/vipul/Desktop/new.csv")

答案 1 :(得分:1)

sep

read_csv中传递正则表达式
import io
t = """name ,age , address
john,25,koramangala banglore +ACMAQA- sales maneger +ACUAJA- india
harshuth rao ,36,belandur banglore +ACMAQA-  maneger +ACUAJA- india 
vijay kumar,45,ulsoor banglore +ACMAQA- sales maneger +ACUAJA- india
suhas,25,koramangala banglore +ACMAQA-analist +ACUAJA- india
mithun,22,venkatapura banglore +ACMAQA- execitive +ACUAJA- india"""

df = pd.read_csv(io.StringIO(t), 
                 sep='\s*\+ACMAQA-\s*|\s*\+ACUAJA-\s*|\s*,\s*', engine='python')
df = df.reset_index()
df.columns = ["name", "age", "city", "position", "country"]


    name          age                   city    position      country
0   john           25   koramangala banglore    sales maneger   india
1   harshuth rao   36   belandur banglore       maneger         india
2   vijay kumar    45   ulsoor banglore sales   maneger         india
3   suhas          25   koramangala banglore    analist         india
4   mithun         22   venkatapura banglore    execitive       india