我有一个CSV文件,试图通过该文件将数据加载到包含2列的SQL表中。我有2列,数据用逗号分隔,标识下一个字段。第二列包含文本和该文本中的一些逗号。 由于多余的逗号,我无法将数据加载到SQL表中,因为它看起来好像有多余的列。我有数百万行数据。如何删除这些多余的逗号?
数据:
Number Address
"12345" , "123 abc street, Unit 345"
"67893" , "567 xyz lane"
"65432" , "789 unit, mno street"
我想删除随机行中地址中多余的逗号。
答案 0 :(得分:0)
如果所有数据的格式都与Number Address "000" , "000 abc street, Unit 000"
相同,则可以拆分列表,删除逗号,然后将列表放回原处,再次使其成为字符串。例如,使用您提供的数据:
ori_addr = "Number Address \"12345\" , \"123 abc street, Unit 345\""
addr = ori_addr.split()
addr[6] = addr[6].replace(",", "")
together_addr = " ".join(addr)
together_addr等于“数字地址“ 12345”,“ 123 abc street Unit 345””,请注意,“ street”和“ Unit”之间没有逗号。
答案 1 :(得分:-1)
下面的代码执行以下操作:
engine
(连接)已创建。DataFrame
DataFrame
用于将数据存储到MySQL。 import csv
import pandas as pd
from sqlalchemy import create_engine
# Set database credentials.
creds = {'usr': 'admin',
'pwd': '1tsaSecr3t',
'hst': '127.0.0.1',
'prt': 3306,
'dbn': 'playground'}
# MySQL conection string.
connstr = 'mysql+mysqlconnector://{usr}:{pwd}@{hst}:{prt}/{dbn}'
# Create sqlalchemy engine for MySQL connection.
engine = create_engine(connstr.format(**creds))
# Read addresses from mCSV file.
text = list(csv.reader(open('comma_test.csv'), skipinitialspace=True))
# Replace all commas which are not used as field separators.
# Remove additional whitespace.
for idx, row in enumerate(text):
text[idx] = [i.strip().replace(',', '') for i in row]
# Store data into a DataFrame.
df = pd.DataFrame(data=text, columns=['number', 'address'])
# Write DataFrame to MySQL using the engine (connection) created above.
df.to_sql(name='commatest', con=engine, if_exists='append', index=False)
comma_test.csv
):"12345" , "123 abc street, Unit 345"
"10101" , "111 abc street, Unit 111"
"20202" , "222 abc street, Unit 222"
"30303" , "333 abc street, Unit 333"
"40404" , "444 abc street, Unit 444"
"50505" , "abc DR, UNIT# 123 UNIT 123"
['12345 ', '123 abc street, Unit 345']
['10101 ', '111 abc street, Unit 111']
['20202 ', '222 abc street, Unit 222']
['30303 ', '333 abc street, Unit 333']
['40404 ', '444 abc street, Unit 444']
['50505 ', 'abc DR, UNIT# 123 UNIT 123']
['12345', '123 abc street Unit 345']
['10101', '111 abc street Unit 111']
['20202', '222 abc street Unit 222']
['30303', '333 abc street Unit 333']
['40404', '444 abc street Unit 444']
['50505', 'abc DR UNIT# 123 UNIT 123']
number address
12345 123 abc street Unit 345
10101 111 abc street Unit 111
20202 222 abc street Unit 222
30303 333 abc street Unit 333
40404 444 abc street Unit 444
50505 abc DR UNIT# 123 UNIT 123
这是一个漫长的过程。但是,每个步骤都经过有意分解,以清楚地显示所涉及的步骤。