从其他列中删除字符串行及其对应的值

时间:2017-09-14 23:04:59

标签: python rows

请帮我弄明白该怎么做。我有一个数据帧。在“指标”栏中有一堆不同的参数(字符串),但我只需要“生活满意度”。我不知道如何删除其他指标,如“没有基本设施的住所”及其相应的价值观和国家。

import numpy as np
import pandas as pd

oecd_bli = pd.read_csv("/Users/vladelec/Desktop/Life.csv")
df = pd.DataFrame(oecd_bli)
df.drop(df.columns[[0,2,4,5,6,7,8,9,10,11,12,13,15,16]], axis=1, inplace=True) 
#dropped other columns that a do not need

以下是我的数据框的截图:

Example of Dataframe

2 个答案:

答案 0 :(得分:1)

您可以加载数据,如下所示:

file_name = "/Users/vladelec/Desktop/Life.csv"

# Columns you want to load
keep_cols = ['Country', 'Indicator']

# pd.read_csv() will load the data into a pd.DataFrame
oecd_bli = pd.read_csv(file_name, usecols=keep_cols)

如果您只想"Life Satisfaction" Indicator,那么您可以执行以下操作:

oecd_bli = oecd_bli[oecd_bli['Indicator'] == "Life Satisfaction"]

如果您想要保留更多Indicators,那么您可以这样做:

keep_indicators = [
    "Life Satisfaction",
    "Crime Indicator",
]

oecd_bli = oecd_bli[oecd_bli['Indicator'].isin(keep_indicators)]

答案 1 :(得分:0)

@GiantsLoveDeathMetal有好点。原则上,您可以将原始数据作为oecd_bli读取,并选择满足特定条件的DataFrame子集。

<强>演示

import pandas as pd


# Given a DataFrame of raw data
d = {
    "Country": pd.Series(["Australia", "Austria", "Fiji", "Japan"]),
    "Indicator": pd.Series(["Dwellings ...", "Dwellings ...", "Life ...", "Life ..."]),
    "Value": pd.Series([1.1, 1.0, 2.2, 2.9]),
}

oecd_bli = pd.DataFrame(d, columns=["Country", "Indicator", "Value"] )
oecd_bli

enter image description here

# Select rows starting with "Life" in column "Indicator"
condition = oecd_bli["Indicator"].str.startswith("Life")
subset = oecd_bli[condition]
subset

enter image description here

或者,通过.loc使用标签索引选择子集:

subset = oecd_bli.loc[condition, :]

此处loc需要[<rows>, <columns>]。因此,显示满足条件的那些行。

<强>详情

请注意,为给出True条件的每一行都会显示一个DataFrame视图。这是因为DataFrame响应boolean arrays

布尔数组的示例:

>>> condition = oecd_bli["Indicator"].str.startswith("Life")
>>> condition

0    False
1    False
2     True
3     True
Name: Indicator, dtype: bool

设置条件的其他方法:

>>> condition = oecd_bli["Indicator"] == "Life ..."
>>> condition = ~oecd_bli["Indicator"].str.startswith("Dwell")
>>> condition = oecd_bli["Indicator"].isin(["Life ...", "Crime ..."])
>>> condition = (oecd_bli["Indicator"] == "Life ...") | (oecd_bli["Indicator"] == "Crime ...") 
  1. 直接平等(==
  2. 排除(~)不受欢迎的事件
  3. 通过isin
  4. 添加列入白名单的列
  5. 与逻辑位运算符(|&等进行多次比较。)