pandas.isin打破了所有大写?

时间:2019-01-16 21:59:04

标签: python pandas dataframe

我发现了isin大熊猫功能,但似乎所有大写字母都没有显示?

import pandas as pd
df = pd.read_json('{"Technology Group":{"0":"Cloud","1":"Cloud","2":"Cloud","3":"Collaboration","4":"Collaboration","5":"Collaboration","6":"Collaboration","7":"Collaboration","8":"Collaboration","9":"Core", "10": "Software"},"Technology":{"0":"AMP","1":"EWS","2":"Webex","3":"Telepresence","4":"Call Manager","5":"Contact Center","6":"MS Voice","7":"Apps","8":"PRIME  ","9":"Wirelees", "10": "Prime Infrastructure"}}')

+------------------+----------------------+
| Technology Group | Technology           |
+------------------+----------------------+
| Cloud            | AMP                  |
+------------------+----------------------+
| Cloud            | EWS                  |
+------------------+----------------------+
| Cloud            | Webex                |
+------------------+----------------------+
| Collaboration    | Telepresence         |
+------------------+----------------------+
| Collaboration    | Call Manager         |
+------------------+----------------------+
| Collaboration    | Contact Center       |
+------------------+----------------------+
| Collaboration    | MS Voice             |
+------------------+----------------------+
| Collaboration    | Apps                 |
+------------------+----------------------+
| Collaboration    | PRIME                |
+------------------+----------------------+
| Core             | Wirelees             |
+------------------+----------------------+
| Software         | Prime Infrastructure |
+------------------+----------------------+

tech_input2 = ['AMP', 'Call Manager', 'PRIME']
df = df[df['Technology'].isin(tech_input2)]

它将显示下表:

+------------------+--------------+
| Technology Group | Technology   |
+------------------+--------------+
| Cloud            | AMP          |
+------------------+--------------+
| Collaboration    | Call Manager |
+------------------+--------------+

...而不是:

+------------------+--------------+
| Technology Group | Technology   |
+------------------+--------------+
| Cloud            | AMP          |
+------------------+--------------+
| Collaboration    | Call Manager |
+------------------+--------------+
| Collaboration    | PRIME        |
+------------------+--------------+

这是一个错误吗?还是我做错了什么?从技术上讲,它不是表中原始最后一行的重复,但是不确定如何解密。它的行为似乎更像contains而不是isin ...

1 个答案:

答案 0 :(得分:1)

这可能是由于空格。 strip()根据参数(指定要删除的字符集的字符串)从左右两个方向删除字符。

import pandas as pd
df = pd.read_json('{"Technology Group": {"0":"Cloud","1":"Cloud", 
"2":"Cloud","3":"Collaboration", "4":"Collaboration" ,":"Collaboration", 
"6":"Collaboration", "7":"Collaboration","8":"Collaboration","9":"Core", 
"10": "Software"},"Technology":{"0":"AMP","1":"EWS","2":"Webex","3":"Telepresence",
"4":"Call Manager","5":"Contact Center","6":"MS Voice","7":"Apps","8":"PRIME  
","9":"Wirelees", "10": "Prime Infrastructure"}}')

df['Technology'] = df['Technology'].str.strip()
tech_input2 = ['AMP', 'Call Manager', 'PRIME']
df = df[df['Technology'].isin(tech_input2)]