我从Excel工作表读取的数据框中有列名的曲折列表。数据将作为具有两个列标签级别的多索引数据框导入。我想创建包含特定字符串的某些列名称的列表,以便可以从数据框中删除它们。
我的想法是使用这样的东西:
# Create list of names for unwanted columns.
lst = [col for col in df.columns if 'ISTD' in col]
# Returns empty.
# Drop columns from dataframe.
df.drop(labels = lst, axis=1, level=0, inplace=True)
尽管该列表返回空,所以我想问题是我不知道如何正确选择多索引数据框中的列。我发现文档难以理解,因此希望在这里找到答案。
以下是我的列名供参考:
df.columns
Out[44]:
MultiIndex([('115 In ( ISTD ) [ He Gas ] ', 'CPS'),
('115 In ( ISTD ) [ He Gas ] ', 'CPS RSD'),
( '137 Ba [ He Gas ] ', 'Conc. RSD'),
( '137 Ba [ He Gas ] ', 'Conc. [ ppb ]'),
( '137 Ba [ He Gas ] ', 'Meas. Conc. [ ppb ]'),
('159 Tb ( ISTD ) [ He Gas ] ', 'CPS'),
('159 Tb ( ISTD ) [ He Gas ] ', 'CPS RSD'),
('175 Lu ( ISTD ) [ He Gas ] ', 'CPS'),
('175 Lu ( ISTD ) [ He Gas ] ', 'CPS RSD'),
( '208 Pb [ He Gas ] ', 'Conc. RSD'),
( '208 Pb [ He Gas ] ', 'Conc. [ ppb ]'),
( '208 Pb [ He Gas ] ', 'Meas. Conc. [ ppb ]'),
( '23 Na [ He Gas ] ', 'Conc. RSD'),
( '23 Na [ He Gas ] ', 'Conc. [ ppb ]'),
( '23 Na [ He Gas ] ', 'Meas. Conc. [ ppb ]'),
( '24 Mg [ He Gas ] ', 'Conc. RSD'),
( '24 Mg [ He Gas ] ', 'Conc. [ ppb ]'),
( '24 Mg [ He Gas ] ', 'Meas. Conc. [ ppb ]'),
( '27 Al [ He Gas ] ', 'Conc. RSD'),
( '27 Al [ He Gas ] ', 'Conc. [ ppb ]'),
( '27 Al [ He Gas ] ', 'Meas. Conc. [ ppb ]'),
( '39 K [ He Gas ] ', 'Conc. RSD'),
( '39 K [ He Gas ] ', 'Conc. [ ppb ]'),
( '39 K [ He Gas ] ', 'Meas. Conc. [ ppb ]'),
( '44 Ca [ He Gas ] ', 'Conc. RSD'),
( '44 Ca [ He Gas ] ', 'Conc. [ ppb ]'),
( '44 Ca [ He Gas ] ', 'Meas. Conc. [ ppb ]'),
( '45 Sc ( ISTD ) [ He Gas ] ', 'CPS'),
( '45 Sc ( ISTD ) [ He Gas ] ', 'CPS RSD'),
( '52 Cr [ He Gas ] ', 'Conc. RSD'),
( '52 Cr [ He Gas ] ', 'Conc. [ ppb ]'),
( '52 Cr [ He Gas ] ', 'Meas. Conc. [ ppb ]'),
( '55 Mn [ He Gas ] ', 'Conc. RSD'),
( '55 Mn [ He Gas ] ', 'Conc. [ ppb ]'),
( '55 Mn [ He Gas ] ', 'Meas. Conc. [ ppb ]'),
( '56 Fe [ He Gas ] ', 'Conc. RSD'),
( '56 Fe [ He Gas ] ', 'Conc. [ ppb ]'),
( '56 Fe [ He Gas ] ', 'Meas. Conc. [ ppb ]'),
( '60 Ni [ He Gas ] ', 'Conc. RSD'),
( '60 Ni [ He Gas ] ', 'Conc. [ ppb ]'),
( '60 Ni [ He Gas ] ', 'Meas. Conc. [ ppb ]'),
( '63 Cu [ He Gas ] ', 'Conc. RSD'),
( '63 Cu [ He Gas ] ', 'Conc. [ ppb ]'),
( '63 Cu [ He Gas ] ', 'Meas. Conc. [ ppb ]'),
( '66 Zn [ He Gas ] ', 'Conc. RSD'),
( '66 Zn [ He Gas ] ', 'Conc. [ ppb ]'),
( '66 Zn [ He Gas ] ', 'Meas. Conc. [ ppb ]'),
( '7 Li ( ISTD ) [ He Gas ] ', 'CPS'),
( '7 Li ( ISTD ) [ He Gas ] ', 'CPS RSD'),
( '72 Ge ( ISTD ) [ He Gas ] ', 'CPS'),
( '72 Ge ( ISTD ) [ He Gas ] ', 'CPS RSD'),
( '75 As [ He Gas ] ', 'Conc. RSD'),
( '75 As [ He Gas ] ', 'Conc. [ ppb ]'),
( '75 As [ He Gas ] ', 'Meas. Conc. [ ppb ]'),
( '78 Se [ He Gas ] ', 'Conc. RSD'),
( '78 Se [ He Gas ] ', 'Conc. [ ppb ]'),
( '78 Se [ He Gas ] ', 'Meas. Conc. [ ppb ]'),
( '82 Se [ He Gas ] ', 'Conc. RSD'),
( '82 Se [ He Gas ] ', 'Conc. [ ppb ]'),
( '82 Se [ He Gas ] ', 'Meas. Conc. [ ppb ]'),
( '95 Mo [ He Gas ] ', 'Conc. RSD'),
( '95 Mo [ He Gas ] ', 'Conc. [ ppb ]'),
( '95 Mo [ He Gas ] ', 'Meas. Conc. [ ppb ]'),
( 'Sample', 'Acq. Date-Time'),
( 'Sample', 'Comment'),
( 'Sample', 'Data File'),
( 'Sample', 'Level'),
( 'Sample', 'Rjct'),
( 'Sample', 'Sample Name'),
( 'Sample', 'Total Dil.'),
( 'Sample', 'Type'),
( 'Sample', 'Unnamed: 0_level_1'),
( 'Sample', 'Vial Number')]
感谢阅读。
答案 0 :(得分:1)
因此,在使用多列的情况下,df.columns
返回一个可以视为元组列表的对象(MultiIndex类型。
您可以像这样遍历它们并删除它们:
cols = [(first, second) for first, second in df.columns if 'ISTD' in second]
df.drop(cols, axis=1, level=1)
这只会在第二层(从df.columns中获得的元组的第二个值)中寻找“ ISTD”。
答案 1 :(得分:1)
多索引列是元组的列表。您可以这样做:
lst = [col for col in df.columns if 'ISTD' in col[0]]
df = df.drop(lst, axis=1)
答案 2 :(得分:0)
您无需创建列表,使用“ usecols”读取文件时也无法读取列
data = pd.read_excel(directory, usecols = lambda x: False if "unwanted_string" in x else True)
如果您仍要创建列表,则可以单独获得标题行,然后遍历该列表以消除带有多余字符串的列表。
#Read in the column names as a list:
cols = pd.read_excel(directory, header=None, nrows=1, index_col = 0).values[0]
cols = cols.tolist()
#remove the elements that contain the unwanted string
for item in cols:
if "string" in str(item):
cols.remove(item)
else:
continue
#then assign cols list as columns of the dataframe:
data.columns = cols
答案 3 :(得分:0)
这是另一种方式。首先,创建一个具有4行的示例MultiIndex(每行是一个元组):
midx = pd.MultiIndex.from_tuples([
('115 In ( ISTD ) [ He Gas ] ', 'CPS'),
('115 In ( ISTD ) [ He Gas ] ', 'CPS RSD'),
( '137 Ba [ He Gas ] ', 'Conc. RSD'),
( '137 Ba [ He Gas ] ', 'Conc. [ ppb ]'),
])
现在,创建遮罩(在多重索引的第一部分中查找ISTD):
mask = np.array(['ISTD' in idx for idx in midx.get_level_values(0)])
midx[ ~ mask ]
MultiIndex([('137 Ba [ He Gas ] ', 'Conc. RSD'),
('137 Ba [ He Gas ] ', 'Conc. [ ppb ]')],
)