在Python中通过其列标题引用表列

时间:2015-06-18 04:36:17

标签: python-2.7

是否有Pythonic方式按名称引用2D列表的列?

我从网上导入了很多表,所以我创建了一个通用函数,用于创建各种HTML表中的二维列表。到现在为止还挺好。但下一步通常是逐行解析表。

# Sample table. 
# In real life I would do something like: table = HTML_table('url', 'table id')
table = 
[
    ['Column A', 'Column B', 'Column C'],
    ['One', 'Two', 3],
    ['Four', 'Five', 6]
]

# Current code:
iA = table[0].index('Column A')
iB = tabel[0].index('Column B')
for row in table[1:]:
    process_row(row[iA], row[iC])

# Desired code:
for row in table[1:]:
    process_row(row['Column A'], row['Column C'])

3 个答案:

答案 0 :(得分:2)

我觉得你真的很喜欢熊猫模块! http://pandas.pydata.org/

将您的列表放入DataFrame

这也可以直接从html,csv等完成。

df = pd.DataFrame(table[1:], columns=table[0]).astype(str)

访问列

df['Column A']

按索引访问第一行

df.iloc[0]

逐行处理

df.apply(lambda x: '_'.join(x), axis=0)

for index,row in df.iterrows():
    process_row(row['Column A'], row['Column C'])

处理列

df['Column C'].astype(int).sum()

答案 1 :(得分:0)

键的有序对象是列名还是行列表不是更好的解决方法吗?我会选择像:

table = {
    'Column A': [1, 4],
    'Column B': [2, 5],
    'Column C': [3, 6]
}

# And you would parse column by column...

for col, rows in table.iteritems():
    #do something

答案 2 :(得分:0)

My QueryList易于使用。

  

ql.filter(图集= '123')

     

ql.group_by(['portfolio','ticker'])

class QueryList(list):
    """filter and/or group_by a list of objects."""

    def group_by(self, attrs) -> dict:
        """Like a database group_by function.

        args:
            attrs: str or list.

        Returns:
            {value_of_the_group: list_of_matching_objects, ...}
            When attrs is a list, each key is a tuple.
            Ex:
            {'AMZN': QueryList(),
            'MSFT': QueryList(),
            ...
            }
            -- or --
            {('Momentum', 'FB'): QueryList(),
             ...,
            }
        """
        result = defaultdict(QueryList)
        if isinstance(attrs, str):
            for item in self:
                result[getattr(item, attrs)].append(item)
        else:
            for item in self:
                result[tuple(getattr(item, x) for x in attrs)].append(item)

        return result

   def filter(self, **kwargs):
        """Returns the subset of IndexedList that has matching attributes.
        args:
            kwargs: Attribute name/value pairs.

        Example:
            foo.filter(portfolio='123', account='ABC').
        """
        ordered_kwargs = OrderedDict(kwargs)
        match = tuple(ordered_kwargs.values())

        def is_match(item):
            if tuple(getattr(item, y) for y in ordered_kwargs.keys()) == match:
                return True
            else:
                return False

        result = IndexedList([x for x in self if is_match(x)])

        return result

    def scalar(self, default=None, attr=None):
        """Returns the first item in this QueryList.

        args:
            default: The value to return if there is less than one item,
                or if the attr is not found.
            attr: Returns getattr(item, attr) if not None.
        """
        item, = self[0:1] or [default]

        if attr is None:
            result = item
        else:
            result = getattr(item, attr, default)
        return result
我试过熊猫。我想要它,我真的喜欢它。但最终它对我的需求来说太复杂了。

例如:

  

df [df ['portfolio'] =='123']& df ['ticker'] =='MSFT']]

不像

那么简单
  

ql.filter(portfolio ='123',ticker ='MSFT')

此外,创建QueryList比创建df简单。

那是因为你倾向于使用带有QueryList的自定义类。数据转换代码自然会被放入自定义类中,从而使其与逻辑的其余部分分开。但是,df的数据转换通常与其余代码内联。