在用逗号分隔后,如何仅从python数据框中提取具有值的条目

时间:2018-06-19 06:57:13

标签: python python-3.x python-2.7 jupyter-notebook

如何在用逗号分隔后,只从python数据框中提取带有值的条目?

enter image description here

在这个例子中,对于第0行,我想要列1,2,3但是对于第1行,我想要列2,3,4,即我提取的三列应该是数据帧行的最后三个条目(不包括无)

这就是我做的事情

import pandas as pd
import numpy as np
from pathlib import Path

filepath = Path("/home/anand/Downloads/Python Test/Python Test/data/input/test_addresses.csv")

def extract_address_components(filepath):
    mms = pd.read_csv(filepath) 
    mms.fillna(0, inplace=True)
    addr = mms['Address'].str.split(',', expand=True)
    mms['Locality'] = addr.apply(loca, axis=1)
    mms['City'] = addr.apply(cit, axis=1)
    mms['State'] = addr.apply(sta, axis=1)
    return mms

def loca(x):
    if x.last_valid_index() is None:
        return np.nan
    else:
            return x[x.last_valid_index()-2]

def cit(x):
    if x.last_valid_index() is None:
        return np.nan
    else:
        return x[x.last_valid_index()-1]

def sta(x):
    if x.last_valid_index() is None:
        return np.nan
    else:
        return x[x.last_valid_index()]

if __name__ == '__main__':
    mms = extract_address_components(filepath=filepath)
    mms.to_csv('/home/anand/Desktop/address_components.csv')

使用pandas有更好的方法吗?

1 个答案:

答案 0 :(得分:0)

使用列表理解:

import pandas as pd

num_cols = 3
clean_data = []

data = [["Near Wagheshwar Temple", "Wagholi", "Pune", "Maharashtra", None, None],
        ["Magarpatta", "Pune", "Magarpatta", "Pune", "Maharashtra", None],
        ["Manikbaug Sinhgad Road Pune", "Sinhgad Road", "Pune", "Maharashtra", None, None],
        ["Kothrud", "Pune", "Maharashtra", None, None, None],
        ["Pimple Nilakh", "Pune", "Maharashtra", None, None, None],
        ["Opposite To D Mart And Next To Cybage It", "Kalyani Nagar", "Pune", "Maharashtra", None, None],
        ["Pune", "Pimple Nilakh", "Pune", "Maharashtra", None, None],
        ["Flat No15 ", "2nd Floor", "Near Jakat Naka,Pune", "Bekrai Nagar", "Pune", "Maharashtra"],
        ["Wagholi", "Pune", "Maharashtra", None, None, None],
        ["Wakad Near Euro School", "Shankar Kalat Nagar", "Pune", "Maharashtra", None, None]]

for row in data:
    y = [a for a in row if a is not None]
    clean_data.append(y[-num_cols:])

labels = ["Locality", "City", "State"]

df = pd.DataFrame.from_records(clean_data, columns=labels)

print(df)

              Locality  City        State
0              Wagholi  Pune  Maharashtra
1           Magarpatta  Pune  Maharashtra
2         Sinhgad Road  Pune  Maharashtra
3              Kothrud  Pune  Maharashtra
4        Pimple Nilakh  Pune  Maharashtra
5        Kalyani Nagar  Pune  Maharashtra
6        Pimple Nilakh  Pune  Maharashtra
7         Bekrai Nagar  Pune  Maharashtra
8              Wagholi  Pune  Maharashtra
9  Shankar Kalat Nagar  Pune  Maharashtra