嗨〜我正在处理我的数据。
我想用条件语句提取数据
这是我的代码。
# -*- coding: utf-8 -*-
import pandas as pd
import numpy as np
import os
join_file = r'D:\handling data\complete data\조인\after_join.csv'
pwd = os.getcwd()
os.chdir(os.path.dirname(join_file))
join_data = pd.read_csv(os.path.basename(join_file), sep=',', encoding='utf-8')
print(join_data.head())
join_data['cluster_z'] = 4 # 둘다 하락세
join_data['cluster_z'][((join_data['cluster_x'] == 3 | join_data['cluster_x'] == 2 | join_data['cluster_x'] == 4 )
& (join_data['cluster_y'] == 3 | join_data['cluster_y'] == 1))] = 1 # 다 상승세
join_data['cluster_z'][((join_data['cluster_x'] == 1 | join_data['cluster_x'] == 5)
& (join_data['cluster_y'] == 3 | join_data['cluster_y'] == 1))] = 2 # 전체 하락세, 점포당 상승세
join_data['cluster_z'][((join_data['cluster_x'] == 3 | join_data['cluster_x'] == 2 | join_data['cluster_x'] == 4 )
& (join_data['cluster_y'] == 2 | join_data['cluster_y'] == 4))] = 3 # 전체 상승세, 점파당 하락세
print(join_data.head())
并执行第二次打印(join_data.head())。 我得到了像图片
这样的错误我该如何解决? 提前谢谢。
答案 0 :(得分:2)
似乎你在条件之间省略了很多括号,更好的是使用loc
:
原件:
<?xml version="1.0" encoding="utf-8"?>
<WixLocalization Culture="en-us" xmlns="http://schemas.microsoft.com/wix/2006/localization">
<String Id="InstallDirDlgTitle" Overridable="yes">{\White12}Destination Folder</String>
<String Id="InstallDirDlgDescription" Overridable="yes">{\White8}Click Next to install to the default folder or click Change to choose another.</String>
<String Id="LicenseAgreementDlgDescription" Overridable="yes">{\White8}Please read the following license agreement carefully</String>
<String Id="LicenseAgreementDlgTitle" Overridable="yes">{\White12}End User License Agreement</String>
<String Id="MaintenanceTypeDlgDescription" Overridable="yes">{\White8}Select the operation you whish to perform.</String>
<String Id="MaintenanceTypeDlgTitle" Overridable="yes">{\White12}Change, repair, or remove installation</String>
<String Id="ProgressDlgTitleInstalling" Overridable="yes">{\White12}Installing [ProductName]</String>
<String Id="VerifyReadyDlgInstallTitle" Overridable="yes">{\White12}Ready to Install [ProductName]</String>
<String Id="VerifyReadyDlgRepairTitle" Overridable="yes">{\White12}Ready to repair [ProductName]</String>
<String Id="VerifyReadyDlgRemoveTitle" Overridable="yes">{\White12}Ready to remove [ProductName]</String>
</WixLocalization>
更改为:
join_data['cluster_z']
[((join_data['cluster_x'] == 3 |
join_data['cluster_x'] == 2 |
join_data['cluster_x'] == 4 ) &
(join_data['cluster_y'] == 3 |
join_data['cluster_y'] == 1))] = 1
或者更好地使用isin
:
join_data.loc[
((join_data['cluster_x'] == 3) |
(join_data['cluster_x'] == 2) |
(join_data['cluster_x'] == 4) ) &
((join_data['cluster_y'] == 3) |
(join_data['cluster_y'] == 1)), 'cluster_z'] = 1
所有在一起:
join_data.loc[
(join_data['cluster_x'].isin([3,2,4])) &
(join_data['cluster_y'].isin([3,1])), 'cluster_z'] = 1
或更具可读性:
join_data = pd.DataFrame({'cluster_x':[3,2,5,3],
'cluster_y':[3,0,1,2]})
print (join_data)
cluster_x cluster_y
0 3 3
1 2 0
2 5 1
3 3 2
join_data['cluster_z'] = 4
join_data.loc[
(join_data['cluster_x'].isin([3,2,4])) &
(join_data['cluster_y'].isin([3,1])), 'cluster_z'] = 1
join_data.loc[
(join_data['cluster_x'].isin([1,5])) &
(join_data['cluster_y'].isin([3,1])), 'cluster_z'] = 2
join_data.loc[
(join_data['cluster_x'].isin([3,2,4])) &
(join_data['cluster_y'].isin([2,4])), 'cluster_z'] = 3
print (join_data)
cluster_x cluster_y cluster_z
0 3 3 1
1 2 0 4
2 5 1 2
3 3 2 3
多个numpy.where
的解决方案:
mask1 = join_data['cluster_x'].isin([3,2,4])
mask2 = join_data['cluster_y'].isin([3,1])
mask3 = join_data['cluster_x'].isin([1,5])
mask4 = join_data['cluster_y'].isin([2,4])
join_data['cluster_z'] = 4
join_data.loc[mask1 & mask2 , 'cluster_z'] = 1
join_data.loc[mask3 & mask2 , 'cluster_z'] = 2
join_data.loc[mask1 & mask4 , 'cluster_z'] = 3
print (join_data)
cluster_x cluster_y cluster_z
0 3 3 1
1 2 0 4
2 5 1 2
3 3 2 3