熊猫条件语句问题

时间:2017-02-21 08:38:48

标签: python pandas

嗨〜我正在处理我的数据。

我想用条件语句提取数据

这是我的代码。

# -*- coding: utf-8 -*-
import pandas as pd
import numpy as np
import os

join_file = r'D:\handling data\complete data\조인\after_join.csv'
pwd = os.getcwd()
os.chdir(os.path.dirname(join_file))
join_data = pd.read_csv(os.path.basename(join_file), sep=',', encoding='utf-8')

print(join_data.head())

enter image description here

join_data['cluster_z'] = 4 # 둘다 하락세           
join_data['cluster_z'][((join_data['cluster_x'] == 3 | join_data['cluster_x'] == 2 | join_data['cluster_x'] == 4 )
                   & (join_data['cluster_y'] == 3 | join_data['cluster_y'] == 1))] = 1 # 다 상승세

join_data['cluster_z'][((join_data['cluster_x'] == 1 | join_data['cluster_x'] == 5)
                   & (join_data['cluster_y'] == 3 | join_data['cluster_y'] == 1))] = 2 # 전체 하락세, 점포당 상승세

join_data['cluster_z'][((join_data['cluster_x'] == 3 | join_data['cluster_x'] == 2 | join_data['cluster_x'] == 4 )
                   & (join_data['cluster_y'] == 2 | join_data['cluster_y'] == 4))] = 3 # 전체 상승세, 점파당 하락세

print(join_data.head())

并执行第二次打印(join_data.head())。 我得到了像图片

这样的错误

enter image description here

我该如何解决? 提前谢谢。

1 个答案:

答案 0 :(得分:2)

似乎你在条件之间省略了很多括号,更好的是使用loc

原件:

<?xml version="1.0" encoding="utf-8"?>
<WixLocalization Culture="en-us" xmlns="http://schemas.microsoft.com/wix/2006/localization">  
  <String Id="InstallDirDlgTitle" Overridable="yes">{\White12}Destination Folder</String>
  <String Id="InstallDirDlgDescription" Overridable="yes">{\White8}Click Next to install to the default folder or click Change to choose another.</String>
  <String Id="LicenseAgreementDlgDescription" Overridable="yes">{\White8}Please read the following license agreement carefully</String>
  <String Id="LicenseAgreementDlgTitle" Overridable="yes">{\White12}End User License Agreement</String>
  <String Id="MaintenanceTypeDlgDescription" Overridable="yes">{\White8}Select the operation you whish to perform.</String>
  <String Id="MaintenanceTypeDlgTitle" Overridable="yes">{\White12}Change, repair, or remove installation</String>
  <String Id="ProgressDlgTitleInstalling" Overridable="yes">{\White12}Installing [ProductName]</String>
  <String Id="VerifyReadyDlgInstallTitle" Overridable="yes">{\White12}Ready to Install [ProductName]</String>
  <String Id="VerifyReadyDlgRepairTitle" Overridable="yes">{\White12}Ready to repair [ProductName]</String>
  <String Id="VerifyReadyDlgRemoveTitle" Overridable="yes">{\White12}Ready to remove [ProductName]</String>
</WixLocalization>

更改为:

join_data['cluster_z']
[((join_data['cluster_x'] == 3 | 
   join_data['cluster_x'] == 2 | 
   join_data['cluster_x'] == 4 ) &
  (join_data['cluster_y'] == 3 |
   join_data['cluster_y'] == 1))] = 1

或者更好地使用isin

join_data.loc[
((join_data['cluster_x'] == 3) | 
 (join_data['cluster_x'] == 2) | 
 (join_data['cluster_x'] == 4) ) & 
((join_data['cluster_y'] == 3) | 
 (join_data['cluster_y'] == 1)), 'cluster_z'] = 1 

所有在一起:

join_data.loc[
(join_data['cluster_x'].isin([3,2,4])) & 
(join_data['cluster_y'].isin([3,1])), 'cluster_z'] = 1 

或更具可读性:

join_data = pd.DataFrame({'cluster_x':[3,2,5,3],
                         'cluster_y':[3,0,1,2]})

print (join_data)
   cluster_x  cluster_y
0          3          3
1          2          0
2          5          1
3          3          2

join_data['cluster_z'] = 4

join_data.loc[
(join_data['cluster_x'].isin([3,2,4])) & 
(join_data['cluster_y'].isin([3,1])), 'cluster_z'] = 1 

join_data.loc[
(join_data['cluster_x'].isin([1,5])) & 
(join_data['cluster_y'].isin([3,1])), 'cluster_z'] = 2 

join_data.loc[
(join_data['cluster_x'].isin([3,2,4])) & 
(join_data['cluster_y'].isin([2,4])), 'cluster_z'] = 3

print (join_data)
   cluster_x  cluster_y  cluster_z
0          3          3          1
1          2          0          4
2          5          1          2
3          3          2          3

多个numpy.where的解决方案:

mask1 = join_data['cluster_x'].isin([3,2,4])
mask2 = join_data['cluster_y'].isin([3,1])
mask3 = join_data['cluster_x'].isin([1,5])
mask4 = join_data['cluster_y'].isin([2,4])

join_data['cluster_z'] = 4
join_data.loc[mask1 & mask2 , 'cluster_z'] = 1 
join_data.loc[mask3 & mask2 , 'cluster_z'] = 2 
join_data.loc[mask1 & mask4 , 'cluster_z'] = 3 

print (join_data)
   cluster_x  cluster_y  cluster_z
0          3          3          1
1          2          0          4
2          5          1          2
3          3          2          3