试图将二维列表重新排列成不同的二维列表

时间:2014-04-18 18:57:50

标签: python list python-3.x matrix multidimensional-array

给出类似的输入:

"Date 3" "Location A" "some data" 
"Date 3" "Location B" "some data" 
"Date 3" "Location C" "some data" 
"Date 2" "Location A" "some data" 
"Date 2" "Location B" "some data" 
"Date 1" "Location A" "some data" 
"Date 1" "Location C" "some data" 

我想将它排列成列(最终将其放入电子表格中),如下所示:

        Location A    Location B    Location C
Date 3  some data     some data     some data
Date 2  some data     some data     None
Date 1  some data     None          some data

使用下面的代码,当我将日期分为“月”和“日”时,我将其工作,并将日期视为整数,但在一个月后,它们使用相同的日整数,因此它写入在它上面。

    log = [["Location A", "somedata", 3, "Month"],["Location B", "somedata", 3, "Month"],
       ["Location C", "somedata", 3, "Month"],["Location A", "somedata", 2, "Month"],
       ["Location B", "somedata", 2, "Month"],["Location A", "somedata", 1, "Month"],
       ["Location C","somedata",1,"Month"]]

    locations = ["Location A","Location B","Location C"]

    location = locations
    days = []

    for location, time, day, month in log: 

        for i in range(len(days),day): 
            days.append([i+1] + [None for x in locations])

        days[day - 1][1 + locations.index(location)] = time
        days[day - 1][0] =  month + " " + str(day) # I just hack the date together here

    days = [i for i in days if i.count(None) < len(locations)]

    locations.insert(0,"Date")
    days.insert(0,locations)

    days = list(zip(*days))  

哪会给我(正确)

['Date', 'Location A', 'Location B', 'Location C']
['Month 1', 'somedata', None, 'somedata']
['Month 2', 'somedata', 'somedata', None]
['Month 3', 'somedata', 'somedata', 'somedata']

但我希望将日期保持为一个字符串,并在每次字符串更改时移至下一列,而不是将该日期作为整数使用。

locations = ["A","B","C"]

log = [ ["Date 2", "A", "Time"],["Date 2", "B", "Time"],["Date 2", "C", "Time"],
        ["Date 1", "A", "Time"],["Date 1", "B", "Time"],["Date 1", "C", "Time"] ]
out = []
j   = 0

for index, day in enumerate(log):

     date, location, time = day

     out.append([date] + [None for x in locations])

     if(log[index][0] != log[index-1][0] and index != 0):
          j += 1

     out[j][1 + locations.index(location)] = location

使用这样的东西,我可以得到:

['Date 2', 'A', None, 'C']
['Date 2', 'A', 'B', 'C']
['Date 1', None, None, None]
['Date 1', None, None, None]
['Date 1', None, None, None]

但它填充了太多列的无,因此数据与日期不对应。

有人有什么想法吗?我是初学者,我正在使用Python 3.3

非常感谢你。

1 个答案:

答案 0 :(得分:4)

[社区维基,因为它确实是对不同方法的建议。]

该操作通常被称为&#34;旋转&#34;。像pandas这样的图书馆使这非常简单,如果您正在编写代码来进行后续电子表格处理的中间工作,那么它可以派上用场。

这样的东西
import pandas as pd
df = pd.read_csv("source.dat", delim_whitespace=True, header=None)
pivoted = df.pivot(index=0, columns=1, values=2)
pivoted = pivoted.fillna("None")
pivoted.index.name = ""
pivoted.to_csv("final.csv")

产生

>>> !cat final.csv
,Location A,Location B,Location C
Date 1,some data,None,some data
Date 2,some data,some data,None
Date 3,some data,some data,some data

[我应该提到很多电子表格程序,包括世界上最常见的程序,也可以原生这样做。]


步骤一步:

首先,将文件读入DataFrame(如电子表格页面):

>>> df = pd.read_csv("source.dat", delim_whitespace=True, header=None)
>>> df
        0           1          2
0  Date 3  Location A  some data
1  Date 3  Location B  some data
2  Date 3  Location C  some data
3  Date 2  Location A  some data
4  Date 2  Location B  some data
5  Date 1  Location A  some data
6  Date 1  Location C  some data

[7 rows x 3 columns]

然后使用pivot方法重塑它:

>>> pivoted = df.pivot(index=0, columns=1, values=2)
>>> pivoted
1      Location A Location B Location C
0                                      
Date 1  some data        NaN  some data
Date 2  some data  some data        NaN
Date 3  some data  some data  some data

[3 rows x 3 columns]

pandas使用NaN表示缺失值,但如果您愿意,我们可以设置"None"

>>> pivoted = pivoted.fillna("None")
>>> pivoted
1      Location A Location B Location C
0                                      
Date 1  some data       None  some data
Date 2  some data  some data       None
Date 3  some data  some data  some data

[3 rows x 3 columns]

你似乎不想要一个命名索引,所以让我们摆脱它:

>>> pivoted.index.name = ""
>>> pivoted
1      Location A Location B Location C

Date 1  some data       None  some data
Date 2  some data  some data       None
Date 3  some data  some data  some data

[3 rows x 3 columns]

然后我们可以使用to_csv来写出来。 (如果需要,我们也可以将它直接写入Excel格式的工作簿。)