这是一个示例数据。
import pandas as pd
from StringIO import StringIO
import numpy as np
audit_trail = """\
1|2|ENQ-wbrProcess.php|bus_departures|BUS_SERVICE_NO#DEPARTURE_TM|54790#01/12/2010|BOOKING_STATUS|O|L|PHRTD|2010-12-01 12:42:32
5|0|DTO-transfer.php|bus_services|BUS_SERVICE_NO|159734|BUS_TYPE_CD||DO|PHRTD|2010-12-01 12:43:27
9|0|DTO-transfer.php|bus_services|BUS_SERVICE_NO|159734|EFFECTIVE_FROM||2010-12-02 00:00:00|PHRTD|2010-12-01 12:43:28
13|0|DTO-transfer.php|bus_services|BUS_SERVICE_NO|159734|MAX_CHANCE_SEATS||0|PHRTD|2010-12-01 12:43:28
17|0|DTO-transfer.php|bus_services|BUS_SERVICE_NO|159734|SCHEDULED_NO||15|PHRTD|2010-12-01 12:43:29
21|0|DTO-transfer.php|bus_services|BUS_SERVICE_NO|159734|TRIP_NATURE||Basic|PHRTD|2010-12-01 12:43:29
25|0|DTO-transfer.php|bus_services|BUS_SERVICE_NO|159734|PARCEL_SERVICE||N|PHRTD|2010-12-01 12:43:30
29|0|DTO-transfer.php|bus_services|BUS_SERVICE_NO|159734|TRIP_NO||S11308|PHRTD|2010-12-01 12:43:30
33|0|DTO-transfer.php|bus_services|BUS_SERVICE_NO|159734|IS_AVL_RESERVATION||N|PHRTD|2010-12-01 12:43:31
37|0|DTO-transfer.php|bus_service_seats|BUS_SERVICE_NO|159734|BUS_SERVICE_NO||159734|PHRTD|2010-12-01 12:43:32"""
col_list = ['transaction_id', 'request_id', 'table_name', 'table_unique_field', 'table_unique_value', 'field_name', 'old_value', 'new_value', 'client_id', 'client_type', 'transaction_date']
audit = pd.read_csv(StringIO(audit_trail), sep="|" , names = col_list, index_col='transaction_date' )
In [44]: audit.client_type.nunique()
Out[44]: 1
如果客户端类型为“PHRTD”
,这将返回行audit[(audit.client_type == 'PHRTD')] [['old_value', 'client_type']]
如果client_type的唯一计数为1,则显示两列或仅显示1列(old_value)。这样的东西不起作用:
audit[(if(audit.client_type.nunique() != 1), [['old_value', 'client_type'], ['old_value']])]
我正在寻找一种简单的技术来隐藏所有行重复相同值的列。
答案 0 :(得分:1)
def trim(df):
columns = [col for col in df if df[col].nunique() != 1]
return df[columns]
print(trim(audit.loc[audit.client_type == 'PHRTD', ['old_value', 'client_type']]))
产量
old_value
transaction_date
2010-12-01 12:42:32 BOOKING_STATUS
2010-12-01 12:43:27 BUS_TYPE_CD
2010-12-01 12:43:28 EFFECTIVE_FROM
2010-12-01 12:43:28 MAX_CHANCE_SEATS
2010-12-01 12:43:29 SCHEDULED_NO
2010-12-01 12:43:29 TRIP_NATURE
2010-12-01 12:43:30 PARCEL_SERVICE
2010-12-01 12:43:30 TRIP_NO
2010-12-01 12:43:31 IS_AVL_RESERVATION
2010-12-01 12:43:32 BUS_SERVICE_NO
提示:
audit[(audit.client_type == 'PHRTD')] [['old_value', 'client_type']]
使用链式索引。这对于访问数据很好,但在将新值分配给audit
时可能会失败:
audit[(audit.client_type == 'PHRTD')] [['old_value', 'client_type']] = values # would FAIL to modify audit
因此,最好尽可能避免chained indexing。在这种情况下,您可以使用audit.loc
:
audit.loc[audit.client_type == 'PHRTD', ['old_value', 'client_type']] = values