我有几个csv文件,其中每个文件都有不同的格式。这里有两个不同的csv文件的示例。请查看格式而不是值。
import com.google.devrel.wcl.WearManager;
public class MyWearApp extends Application {
private static MyWearApp application;
public MyWearApp getInstance() {
return application;
}
@Override
public void onCreate() {
super.onCreate();
WearManager.initialize(getApplicationContext());
application = this;
}
...
}
正如你所看到的,csv_2包含在“有时”中,但是csv_1是一种简单的格式。我得到了所有csvs的需求而且它们非常庞大。我试图使用嗅探器来自动识别方言。但是这还不够,我对那个有“”的人没有得到合理的回答。有没有人能指导我如何解决这个问题?
Python代码2.7
csv_2 "xxxx-0147-xxxx-194443,""Jan 1, 2017"",7:43:43 AM PST,,Google fee,,Smart Plan (Calling & Texting),com.yuilop,1,unlimited_usca_tariff_and,mimir,US,TX,76501,USD,-3.00,0.950210,EUR,-2.85"
csv_2 "1305-xxxx-0118-54476..1,""Jan 1, 2017"",7:17:31 AM PST,,Google fee,,Smart Plan (Calling & Texting),com.yuilop,1,unlimited_usca_tariff_and,htc_a13wlpp,US,TX,79079,USD,-3.00,0.950210,EUR,-2.85"
csv_1 GPA.xxxx-2612-xxxx-44448..0,2017-02-01,1485950845,Charged,m1,Freedom Plan (alling & Texting),com.yuilop,subscription,basic_usca_tariff_and,USD,2.99,0.00,2.99,,,07605,US
csv:1 GPA.xxxx-6099-9725-56125,2017-02-01,1485952917,Charged,athene_f,Buy 100 credits (Calling & Texting),com.yuilop,inapp,100_credits,INR,138.41,0.00,138.41,Kolkata,West Bengal,700007,IN
参数值:
With open(file, 'rU') as csvfile:
dialect = csv.Sniffer().sniff(csvfile.read(2024))
csvfile.seek(0)
reader = csv.reader(csvfile, dialect)
for line in reader:
print line
结果
dialect.escapechar None
dialect.quotechar "
dialect.quoting 0
dialect.escapechar None
dialect.delimiter ,
dialect.doublequote False
在csv_2中,你看到一团糟。 date由逗号特殊日期字段分隔,并且所有行都被视为字符串。如何更改代码以获得与csv_1相同的结果?
答案 0 :(得分:0)
为什么不预处理csv来清理“并将其规范化,然后像其他csv一样加载数据?
答案 1 :(得分:0)
您距离工作代码只有一步之遥。您要做的就是replace
中的"
csvfile
,然后您当前的方法会正常运行。
编辑:但是,如果您想要合并在CSV文件中读取后分隔的日期字符串,那么最好的选择是正则表达式匹配。我在原始答案中加入了一些代码。我已经从this older answer复制了大部分正则表达式代码(包含编辑内容)。
import re
import csv
with open(file, 'rU') as csvfile:
data = csvfile.read(2024)
# Remove the pesky double-quotes
no_quotes_data = data.replace('"', '')
dialect = csv.Sniffer().sniff(no_quotes_data);
csv_data = csv.reader(no_quotes_data.splitlines(), dialect)
pattern = r'(?i)(%s) +(%s)'
thirties = pattern % (
"Sep|Apr|Jun|Nov",
r'[1-9]|[12]\d|30')
thirtyones = pattern % (
"Jan|Mar|May|Jul|Aug|Oct|Dec",
r'[1-9]|[12]\d|3[01]')
feb = r'(Feb) +(?:%s)' % (
r'(?:([1-9]|1\d|2[0-9]))') # 1-29 any year (including potential leap years)
result = '|'.join('(?:%s)' % x for x in (thirties, thirtyones, feb))
r = re.compile(result)
for ind, phrase in enumerate(csv_data):
if r.match(phrase):
# If you've found a date string, a year string will follow
new_data[ind] = ", ".join(csv_data[ind:ind+2])
del csv_data[ind+1]
for line in csv_data: print line