如果搜索的前缀存在或没有数据,Pandas将使用前缀和更多校验和进行检查

时间:2019-07-12 16:15:54

标签: python-3.x pandas numpy pandas-groupby

我有下面的代码片段,效果很好。

import pandas as pd
import numpy as np

prefixes = ['sj00', 'sj12', 'cr00', 'cr08', 'eu00', 'eu50']
df = pd.read_csv('new_hosts', index_col=False, header=None)
df['prefix'] = df[0].str[:4]
df['grp'] = df.groupby('prefix').cumcount()
df = df.pivot(index='grp', columns='prefix', values=0)
df['sj12'] = df['sj12'].str.extract('(\w{2}\d{2}\w\*)', expand=True)
df = df[ prefixes ].dropna(axis=0, how='all').replace(np.nan, '', regex=True)
df = df.rename_axis(None)

示例文件new_hosts

sj000001
sj000002
sj000003
sj000004
sj124000
sj125000
sj126000
sj127000
sj128000
sj129000
sj130000
sj131000
sj132000
cr000011
cr000012
cr000013
cr000014
crn00001
crn00002
crn00003
crn00004
euk000011
eu0000012
eu0000013
eu0000014
eu5000011
eu5000013
eu5000014
eu5000015

当前输出:

sj00        sj12        cr00        cr08        eu00        eu50
sj000001                cr000011    crn00001    euk000011   eu5000011
sj000002                cr000012    crn00002    eu0000012   eu5000013
sj000003                cr000013    crn00003    eu0000013   eu5000014
sj000004                cr000014    crn00004    eu0000014   eu5000015

期待什么:

1)由于代码工作正常,但如您所见,current output第二列没有任何值,但仍显示出来,所以,如果特定列没有任何值,我怎么能有校验和呢?从显示中删除。

2)我们可以检查prefixes是否在数据框中存在,然后再进行处理以避免错误。

感谢任何帮助。

2 个答案:

答案 0 :(得分:1)

IIUC,之前

df = df[ prefixes ].dropna(axis=0, how='all').replace(np.nan, '', regex=True)

您可以这样做:

# remove all empty columns
df = df.dropna(axis=1, how='all')

那将解决您的第一部分。第二部分可以是reindex

# select prefixes:
prefixes = ['sj00', 'sj12', 'cr00', 'cr08', 'eu00', 'eu50', 'sh00', 'dt00', 'sh00', 'dt00']

df = df.reindex(prefixes, axis=1).dropna(axis=1, how='all').replace(np.nan, '', regex=True)

请注意,axis=1而不是axis=0与我对问题1的建议相同。

答案 1 :(得分:0)

非常感谢Quang Hoang的提示,只是为了解决此问题,我按如下方式工作,直到得到更好的答案为止:

# Select prefixes
prefixes = ['sj00', 'sj12', 'cr00', 'cr08', 'eu00', 'eu50']

df = pd.read_csv('new_hosts', index_col=False, header=None)

df['prefix'] = df[0].str[:4]

df['grp'] = df.groupby('prefix').cumcount()

df = df.pivot(index='grp', columns='prefix', values=0)

df = df[prefixes]

# For column `sj12` only extract the values having `sj12` and a should be a word immediately after that like `sj12[a-z]`
df['sj12'] = df['sj12'].str.extract('(\w{2}\d{2}\w\*)', expand=True)

df.replace('', np.nan, inplace=True)

# Remove the empty columns
df = df.dropna(axis=1, how='all')

# again drop if all values in the row are nan and replace nan to empty for live columns
df = df.dropna(axis=0, how='all').replace(np.nan, '', regex=True)

# drop the index field
df = df.rename_axis(None)

print(df)