将数据框中的值列为“?”的列列出

时间:2018-11-17 18:12:32

标签: python pandas dataframe pycharm

如果缺少的值用'?'编码,则列出数据帧的列名称以及缺少的值数使用pandas和numpy。

import numpy as np

import pandas as pd

bridgeall = pd.read_excel('bridge.xlsx',sheet_name='Sheet1')
#print(bridgeall)


bridge_sep = bridgeall.iloc[:,0].str.split(',',-1,expand=True)
bridge_sep.columns = ['IDENTIF','RIVER', 'LOCATION', 'ERECTED', 'PURPOSE', 'LENGTH', 'LANES','CLEAR-G', 'T-OR-D',
                     'MATERIAL', 'SPAN', 'REL-L', 'TYPE']

print(bridge_sep)

数据:我正在发布一个代码段。实际上是[107行x 13列]。

    IDENTIF RIVER LOCATION   ERECTED    ...    MATERIAL    SPAN REL-L      TYPE
0        E2     A       ?    CRAFTS    ...        WOOD   SHORT     ?     WOOD
1        E3     A       39    CRAFTS    ...        WOOD       ?     S      WOOD
2        E5     A       ?    CRAFTS    ...        WOOD   SHORT     S      WOOD

需要的输出:

LOCATION 2
SPAN 1
REL-L 1

2 个答案:

答案 0 :(得分:2)

将所有值用eq==)进行比较,对于计数精度,请使用sum-True s是类似于1的过程,然后仅删除{{ 1}}值(False)由boolean indexing

0

最后s = df.eq('?').sum() s = s[s != 0] print (s) LOCATION 2 SPAN 1 REL-L 1 dtype: int64 添加reset_index

DataFrame

编辑:

df1 = s.reset_index()
df1.columns = ['names','count']
print (df1)
      names  count
0  LOCATION      2
1      SPAN      1
2     REL-L      1

np.random.seed(100)
df = pd.DataFrame(np.random.randint(10, size=(5,5)))
print (df)
   0  1  2  3  4
0  8  8  3  7  7
1  0  4  2  5  2
2  2  2  1  0  8
3  4  0  9  6  2
4  4  1  5  3  4

#compare with same length Series 
#same index values like index/columns of DataFrame
s = pd.Series(np.arange(5))
print (s)
0    0
1    1
2    2
3    3
4    4
dtype: int32

答案 1 :(得分:0)

如果您的DataFrame命名为df,请尝试(df == '?').sum()