如果缺少的值用'?'编码,则列出数据帧的列名称以及缺少的值数使用pandas和numpy。
import numpy as np
import pandas as pd
bridgeall = pd.read_excel('bridge.xlsx',sheet_name='Sheet1')
#print(bridgeall)
bridge_sep = bridgeall.iloc[:,0].str.split(',',-1,expand=True)
bridge_sep.columns = ['IDENTIF','RIVER', 'LOCATION', 'ERECTED', 'PURPOSE', 'LENGTH', 'LANES','CLEAR-G', 'T-OR-D',
'MATERIAL', 'SPAN', 'REL-L', 'TYPE']
print(bridge_sep)
数据:我正在发布一个代码段。实际上是[107行x 13列]。
IDENTIF RIVER LOCATION ERECTED ... MATERIAL SPAN REL-L TYPE
0 E2 A ? CRAFTS ... WOOD SHORT ? WOOD
1 E3 A 39 CRAFTS ... WOOD ? S WOOD
2 E5 A ? CRAFTS ... WOOD SHORT S WOOD
需要的输出:
LOCATION 2
SPAN 1
REL-L 1
答案 0 :(得分:2)
将所有值用eq
(==
)进行比较,对于计数精度,请使用sum
-True
s是类似于1
的过程,然后仅删除{{ 1}}值(False
)由boolean indexing
:
0
最后s = df.eq('?').sum()
s = s[s != 0]
print (s)
LOCATION 2
SPAN 1
REL-L 1
dtype: int64
添加reset_index
:
DataFrame
编辑:
df1 = s.reset_index()
df1.columns = ['names','count']
print (df1)
names count
0 LOCATION 2
1 SPAN 1
2 REL-L 1
np.random.seed(100)
df = pd.DataFrame(np.random.randint(10, size=(5,5)))
print (df)
0 1 2 3 4
0 8 8 3 7 7
1 0 4 2 5 2
2 2 2 1 0 8
3 4 0 9 6 2
4 4 1 5 3 4
#compare with same length Series
#same index values like index/columns of DataFrame
s = pd.Series(np.arange(5))
print (s)
0 0
1 1
2 2
3 3
4 4
dtype: int32
答案 1 :(得分:0)
如果您的DataFrame命名为df
,请尝试(df == '?').sum()