例如,我们有一个带
的csv文件 name age address
john 25 koramangala banglore #@ sales maneger %$ india
harshuth rao 36 belandur banglore #@ maneger %$ india
vijay kumar 45 ulsoor banglore #@ sales maneger %$ india
suhas 25 koramangala banglore #@analist %$ india
mithun 22 venkatapura banglore #@ execitive %$ india
如何制作并添加到不同的列
name age city country position
john 25 koramangala banglore india sales maneger
harshuth rao 36 belandur banglore india maneger
vijay kumar 45 ulsoor banglore india sales maneger
suhas 25 koramangala banglore india analist
mithun 22 venkatapura banglore india execitive
我使用的代码是
import re
import csv
with open("/home/vipul/Desktop/example.csv", 'rb') as f:
mycsv = csv.reader(f)
for row in mycsv:
text = row[0]
txt = re.findall(r'(\w+[\s\w]*)\b', text)
print txt
这就是它在txt编辑器中的外观
name ,age ,address
john,25,koramangala banglore +ACMAQA- sales maneger +ACUAJA- india
harshuth rao ,36,belandur banglore +ACMAQA- maneger +ACUAJA- india
vijay kumar,45,ulsoor banglore +ACMAQA- sales maneger +ACUAJA- india
suhas,25,koramangala banglore +ACMAQA-analist +ACUAJA- india
mithun,22,venkatapura banglore +ACMAQA- execitive +ACUAJA- india
答案 0 :(得分:1)
首先,使用pd.read_csv
加载您的数据:
import pandas as pd
df = pd.read_csv("/home/vipul/Desktop/example.csv", sep=',')
print(df)
name age address
0 john 25 koramangala banglore +ACMAQA- sales maneger +A...
1 harshuth rao 36 belandur banglore +ACMAQA- maneger +ACUAJA- i...
2 vijay kumar 45 ulsoor banglore +ACMAQA- sales maneger +ACUAJA...
3 suhas 25 koramangala banglore +ACMAQA-analist +ACUAJA- ...
4 mithun 22 venkatapura banglore +ACMAQA- execitive +ACUAJ...
接下来,使用str.split
分隔数据+ pd.concat
以加入原始数据:
v = df.pop('address').str.split('\s*\+.*?-\s*', expand=True)
v.columns = ['city', 'position', 'country']
df = pd.concat([df, v], 1)
print(df)
name age city position country
0 john 25 koramangala banglore sales maneger india
1 harshuth rao 36 belandur banglore maneger india
2 vijay kumar 45 ulsoor banglore sales maneger india
3 suhas 25 koramangala banglore analist india
4 mithun 22 venkatapura banglore execitive india
最后,保存为CSV:
df.to_csv("/home/vipul/Desktop/new.csv")
答案 1 :(得分:1)
在sep
read_csv
中传递正则表达式
import io
t = """name ,age , address
john,25,koramangala banglore +ACMAQA- sales maneger +ACUAJA- india
harshuth rao ,36,belandur banglore +ACMAQA- maneger +ACUAJA- india
vijay kumar,45,ulsoor banglore +ACMAQA- sales maneger +ACUAJA- india
suhas,25,koramangala banglore +ACMAQA-analist +ACUAJA- india
mithun,22,venkatapura banglore +ACMAQA- execitive +ACUAJA- india"""
df = pd.read_csv(io.StringIO(t),
sep='\s*\+ACMAQA-\s*|\s*\+ACUAJA-\s*|\s*,\s*', engine='python')
df = df.reset_index()
df.columns = ["name", "age", "city", "position", "country"]
name age city position country
0 john 25 koramangala banglore sales maneger india
1 harshuth rao 36 belandur banglore maneger india
2 vijay kumar 45 ulsoor banglore sales maneger india
3 suhas 25 koramangala banglore analist india
4 mithun 22 venkatapura banglore execitive india