Python csv writer在每个单词后附加转义字符

时间:2016-11-25 01:01:02

标签: python csv

我正在尝试编写一些csv数据,但是我在csv文件中的每个单词后面都会继续获取转义序列键。

设置:

with open('gibber.csv', 'wb') as csvfile:
    writer = csv.writer(csvfile, delimiter=",", quoting=csv.QUOTE_NONE, escapechar=" ")
    for values in izip_longest(*csv_data, fillvalue="-,-"):
        writer.writerow([unicode(s).encode("utf-8") for s in values])
csvfile.close()

如果我打印出如上所述的writer.writerow(...),下面的行是一个样本。

['dipey,1', 'you have,2', 'at the beginning,1', 'brilliant charles brown truly,1', 'great the first also was,1', 'identical to this one as far,1', 'be when pie mood mark lake a,1', 'shardely uptown is you free on a stone,1', 'let it rest and sun it those it super,1']

我尝试了许多things this,我几乎可以搜索有关csv编写器为什么在每个单词后放置转义序列的事情?

我想要的输出应该是这样的

--------------------------------------------------------------------------
word1 | word_count1 |    word2    | word_count2 | .. wordN | word_countN
--------------------------------------------------------------------------
 word |      3      | word word   |     7       |  .............. N

但我得到的是这样的东西

[] = escapecharacter
--------------------------------------------------------------------------
 word1 | word_count1 |    word2    | word_count2 | .. wordN | word_countN
--------------------------------------------------------------------------
 word[]|      3      |word[] word[]|    7       |  .............. N

使用空格作为我的escapechar然后我在每个单词之后得到一个额外的空格。使用制表符或换行符将破坏行/列布局。使用任何单个字母,数字或甚至\将把该escapechar放在任何行项目的最右边位置,但是双倍空格将消失。

我上面发布的示例列表是我传递给writer.writerow(...)

的列表示例

测试数据

data0 = unicode("Rainforests are forests characterized by high rainfall, with annual rainfall between 250 and 450 centimetres (98 and 177 in).[1] There are two types of rainforest: tropical rainforest and temperate rainforest. The monsoon trough, alternatively known as the intertropical convergence zone, plays a significant role in creating the climatic conditions necessary for the Earth's tropical rainforests. Around 40% to 75% of all biotic species are indigenous to the rainforests.[2] It has been estimated that there may be many millions of species of plants, insects and microorganisms still undiscovered in tropical rainforests. Tropical rainforests have been called the \"jewels of the Earth\" and the \"world's largest pharmacy\", because over one quarter of natural medicines have been discovered there.[3] Rainforests are also responsible for 28% of the world's oxygen turnover, sometimes misnamed oxygen production,[4] processing it through photosynthesis from carbon dioxide and consuming it through respiration. The undergrowth in some areas of a rainforest can be restricted by poor penetration of sunlight to ground level. If the leaf canopy is destroyed or thinned, the ground beneath is soon colonized by a dense, tangled growth of vines, shrubs and small trees, called a jungle. The term jungle is also sometimes applied to tropical rainforests generally.", "utf-8")

data1 = unicode("Tropical rainforests are characterized by a warm and wet climate with no substantial dry season: typically found within 10 degrees north and south of the equator. Mean monthly temperatures exceed 18 °C (64 °F) during all months of the year.[5] Average annual rainfall is no less than 168 cm (66 in) and can exceed 1,000 cm (390 in) although it typically lies between 175 cm (69 in) and 200 cm (79 in).[6] Many of the world's tropical forests are associated with the location of the monsoon trough, also known as the intertropical convergence zone.[7] The broader category of tropical moist forests are located in the equatorial zone between the Tropic of Cancer and Tropic of Capricorn. Tropical rainforests exist in Southeast Asia (from Myanmar (Burma) to the Philippines, Malaysia, Indonesia, Papua New Guinea, Sri Lanka, Sub-Saharan Africa from Cameroon to the Congo (Congo Rainforest), South America (e.g. the Amazon Rainforest), Central America (e.g. Bosawás, southern Yucatán Peninsula-El Peten-Belize-Calakmul), Many Australia, and on many of the Pacific Islands (such as Hawaiʻi). Tropical forests have been called the \"Earth's lungs\", although it is now known that rainforests contribute little net oxygen addition to the atmosphere through photosynthesis", "utf-8")

data2 = unicode("Tropical forests cover many a large part of the globe, but temperate rainforests only occur in few regions around the world. Temperate rainforests are rainforests in temperate regions. They occur in North America (in the Pacific Northwest in Alaska, British Columbia, Washington, Oregon and California), in Europe (parts of the British Isles such as the coastal areas of Ireland and Scotland, southern Norway, parts of the western Balkans along the Adriatic coast, as well as in Galicia and coastal areas of the eastern Black Sea, including Georgia and coastal Turkey), in East Asia (in southern China, Highlands of Taiwan, much of Japan and Korea, and on Sakhalin Island and the adjacent Russian Far East coast), in South America (southern Chile) and also in Australia and New Zealand.[10]", "utf-8")

示例csv_data 查看完整数据here     导入pprint     pp = pprint.PrettyPrinter(缩进= 4)     pp.pprint(csv_data)

[   [   u'shrubs,1',
        u'chile,1',
        u'equatorial,1',
        u'china,1',
        u'may,1',
        u'zone7,1'],
    [   u'washington oregon,1',
        u'new zealand10,1',
        u'moist forests,1',
        u'biotic species,1',
        u'and tropic,1',
        u'term jungle,1',
        u'sometimes misnamed,1',
        u'japan and,1',
        u'the world,1',
        u'200 cm,1',
        u'between the,1',
        u'canopy is,1',
        u'as hawaii,1',
        u'and temperate,1',
        u'many australia,1',
        u'but temperate,1'],
    [   u'cancer and tropic,1',
        u'black sea including,1',
        u'asia in southern,1',
        u'some areas of,1',
        u'also known as,1',
        u'as well as,1',
        u'areas of a,1',
        u'central america eg,1',
        u'250 and 450,1'],
    [   u'rainforest the monsoon trough,1',
        u'shrubs and small trees,1',u'dense tangled growth of,1',
        u'of the british isles,1'],
    [   u'sometimes misnamed oxygen production4 processing,1',
        u'a significant role in creating,1',
        and,1',
        u'are also responsible for 28 of the worlds oxygen,1',
        u'the climatic conditions necessary for the earths tropical rainforests,1',
        u'growth of vines shrubs and small trees called a,1',
        u'columbia washington oregon and california in europe parts of,1']]

你可以从上面的示例数据中看到,然后我izip csv_data来转置它,并写出每一行。

修改

这就是我编写连续数据的方式。

    csv_data = []
    for index, item in enumerate(package.count_set[0]):
        payload = []
        phrase = item[0]
        for pindex, pitem in enumerate(phrase): #pitem is a Counter
            # print(index, pindex, " ".join(pitem), phrase[pitem])
            _str = " ".join(pitem)
            _cnt = phrase[pitem]
            _data = _str+",%d"%(_cnt)
            payload.append(_data)
        csv_data.append(payload)

所以我创建了这样的项目列表 [ "word,count,", "word1,count1,", "word2,count2,", "wordN,countN," ]

我也试过没有尾随的逗号 [ "word,count", "word1,count1", "word2,count2", "wordN,countN" ]

这是我创建此列表的方式payload然后将其附加到csv_data列表问题吗?

1 个答案:

答案 0 :(得分:0)

我不想回答我自己的问题但我通过自己构建字符串并写入文件来解决问题。

_range = files_to_load + 1
with open('data.csv', 'wb') as csvfile:
    header = (["%d word phrase, phrase count"%(i) for i in range(1, _range)])

    header_line = ""
    for index, item in enumerate(header):
        word, count = item.split(",")
        if int(word[0]) <= 1:
            pass
        else:
            word = word.replace("phrase", "phrases")

        header_line += word+","+count+","
    header_line = header_line[:-1]
    header_line += "\n"
    csvfile.write(header_line)

    for values in izip_longest(*csv_data, fillvalue="-,0"):
        line_list = ([unicode(s).encode("utf-8") for s in values])
        line_str = ""
        for item in line_list:
            word, count = item.split(",")
            line_str += word+","+count+","
        line_str = line_str[:-1]+"\n"

        csvfile.write(line_str)
csvfile.close()

上面的代码可能会被清理很多,但不管我做了什么,我都无法让python csv模块正常使用我的数据。

这很可能是用户错误和我的疏忽,但仍然存在。上面的代码以csv格式写出了我需要的内容,没有任何奇怪的工件。