如何使用数据块读取复杂的txt文件并将其保存为python中的csv文件?

时间:2017-09-26 23:33:47

标签: python csv scrape

如果我有一个像这样组织的文件

$cities = [
    'New York City',
    'Arizona State',
    'Arkansas State'
];
$cities = str_replace("City", "", $cities);

$teams = [
    'New York Mets',
    'New York Yankees',
    'Arizona State Sun Devils',
    'Arkansas State Red Wolves'
];

// Replace cities in $cities to an empty string in the $teams array
$teams = str_replace($cities, "", $teams);
$teams = array_map("trim", $teams); // Trim away the excessive spaces
print_r($teams);

我想将数据放在带有以下列的csv文件中:

Array (
    [0] => Mets
    [1] => Yankees
    [2] => Sun Devils
    [3] => Red Wolves
)

我如何用Python做到这一点?问题是,在某些列表中有丢失的数据,所以我知道csv文件中的某些行最终会搞砸,但我不介意做一些手动工作,在我这样做后调整数据库。另一个问题是,国名不同,所以我需要使用++++++++++++++++++++++++++++++++

我试过这样的事情

++++++++++++++
Country 1

**this sentence is not important.
**date 25.09.2017, also not important
*******
Address
**Office

        Address A, 100 City. Country X
**work time 09h00-16h00<br>9h00-14h00
**www.example.com
**emal@example.com;
**012/345 67 89
**téléfax 123/456 67 89
*******
Address
**Home Office

        Address A, 200 City. Country X
**email2@example.com;
**001/000 00 00
**téléfax 111/111 11 11
*******
Address
**Living address

        Address 0, 123 City
**info@example.ch
**000/000 00 00
**téléfax 222/222 22 22
++++++++++++++
Country 2

**this sentence is not important.
**date 25.09.2017, also not important
*******
Address
**Office

        AAA 11, 30 City 

        BBB 22, 30 City
**work time 08h00-12h30  
**www.example.com
**info@example.com
**000/000 00 00
**téléfax 111/11 11 11
*******

ETC

但它没有用。

错误:`回溯(最近一次调用最后一次):   文件&#34; test.py&#34;,第22行,in     csv.write(项目) AttributeError:&#39;模块&#39;对象没有属性&#39;写&#39;

`

2 个答案:

答案 0 :(得分:1)

在最后一行中,您编写了文件对象“csv”而不是“CSV”,这就是出错的原因。

我添加了如何在python中将csv模块用于代码的过程。

现在你所要做的就是解析你的解析方法。

代码:

import csv
with open('listofdata.txt', 'r') as FILE:
   DATA = FILE.read()

LIST = DATA.split('++++++++++++++')

LIST2 = []
LIST3 = []
LIST4 = []

for ITEMS in LIST:
    LIST2 = ITEMS.split('*******')
    for items2 in LIST2:
        LIST3 = items2.split('**')
        LIST4.append(LIST3)

with open('file.csv', 'w') as csvfile:
    spamwriter = csv.writer(csvfile, delimiter=',')
    for ITEMS in LIST4:
        spamwriter.writerow(ITEMS)

输出:

""

"
Country 1

","this sentence is not important.
","date 25.09.2017, also not important
"

"
Address
","Office

        Address A, 100 City. Country X
","work time 09h00-16h00<br>9h00-14h00
","www.example.com
","emal@example.com;
","012/345 67 89
","téléfax 123/456 67 89
"

"
Address
","Home Office

        Address A, 200 City. Country X
","email2@example.com;
","001/000 00 00
","téléfax 111/111 11 11
"

"
Address
","Living address

        Address 0, 123 City
","info@example.ch
","000/000 00 00
","téléfax 222/222 22 22
"

"
Country 2

","this sentence is not important.
","date 25.09.2017, also not important
"

"
Address
","Office

        AAA 11, 30 City

        BBB 22, 30 City
","work time 08h00-12h30
","www.example.com
","info@example.com
","000/000 00 00
","téléfax 111/11 11 11
"

"
"

答案 1 :(得分:0)

当您保存到csv文件时,请使用csv.writer。但首先,您必须为{{1}}文件的结构准备解析器,然后才能将数据保存到csv文件。

或者,您可以使用csv.DictWriter,但无论如何您必须首先准备解析器。