熊猫数据框双数组

时间:2019-02-13 13:03:58

标签: python pandas numpy dataframe where

我正在使用Pandas将数据集导入为数据框。

import pandas as pd 
import numpy  as  np 

# import file as dataframe from Working directory 
df1 = pd.read_excel('20180905_NAICS_to_GCD_industry.xlsx', sheet_name = 0)
# rename columns 
df1 = df1.rename(columns = {'NAICS 2012' : 'NAICS', 'GCD Industry code':'GCD_Code', 'Mapped GCD Industry':'GCD'})

我正在尝试检查GCD列中每个因子在数据帧的哪些行中。

例如,为

np.where(df1['GCD'].eq('Private Sector Services (Household)'))
Out[32]: 
(array([1246, 1247, 1248, 1249, 1250, 1251, 1252, 1253, 1254, 1257, 1258,
        1259, 1260, 1261, 1262, 1263, 1264, 1265, 1266, 1267, 1268, 1269,
        1272, 1273, 1274, 1275, 1276, 1277, 1279, 1280, 1281, 1282, 1283,
        1284, 1285, 1286, 1287, 1288, 1289, 1290, 1291, 1292, 1293, 1294,
        1295, 1296, 1297, 1298, 1299], dtype=int64),)

这是我的期望。但是当我这样做时:

np.where(df1.eq('Public Administration and Defence'))
Out[30]: 
(array([ 942, 1300, 1301, 1302, 1303, 1304, 1305, 1306, 1307, 1308, 1309,
        1310, 1311, 1312, 1313, 1314, 1315, 1316, 1317, 1318, 1319, 1320,
        1321, 1322, 1323, 1324, 1325, 1326, 1327, 1328, 1329, 1330, 1331,
        1332, 1333, 1334, 1335, 1336, 1337, 1338, 1339, 1340, 1341, 1342,
        1343, 1344, 1345, 1346, 1347, 1348, 1349, 1350, 1351, 1352, 1353],
       dtype=int64),
 array([2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
        2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
        2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2], dtype=int64))

我得到两个数组,这产生了一个问题。

有人可以向我解释这个问题的根源是什么,我该如何纠正呢?

以下是我的数据框的一些信息:

df1.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1355 entries, 0 to 1354
Data columns (total 3 columns):
NAICS       1355 non-null int64
GCD_Code    1355 non-null int64
GCD         1355 non-null object
dtypes: int64(2), object(1)
memory usage: 31.8+ KB

0 个答案:

没有答案