如何将迭代下的功能转换为熊猫中的一行

时间:2019-04-01 06:45:34

标签: python pandas

我的数据1如下:

[
{"cut_id":1,"cut_label":"v024","cut_name":"State","value_label":"1","value":"andaman and nicobar islands"},
{"cut_id":3,"cut_label":"v024","cut_name":"State","value_label":"3","value":"arunachal pradesh"},
{"cut_id":635,"cut_label":"sdistri","cut_name":"District","value_label":"599","value":"pathanamthitta"},
{"cut_id":636,"cut_label":"sdistri","cut_name":"District","value_label":"600","value":"kollam"},
{"cut_id":637,"cut_label":"sdistri","cut_name":"District","value_label":"601","value":"thiruvananthapuram"}
]

我想要的输出如下:

[
{"value_label":"S1","value":"andaman and nicobar islands"},
{"value_label":"S3","value":"arunachal pradesh"},
{"value_label":"D599","value":"pathanamthitta"},
{"value_label":"D600","value":"kollam"},
{"value_label":"D601","value":"thiruvananthapuram"}
]

我的意图是通过根据数字是州还是区,在数字后附加一个字符“ S”或“ D”来重命名值标签。

这是我的代码:

for _, r in data[
        (data['cut_name'] == 'State') | (data['cut_name'] == 'District')][
            ['cut_name', 'value', 'value_label']
    ].iterrows():
    cuts_data[r.cut_name[0]+r.value_label] = r.value

我得到了预期的结果,但是有一种方法可以一行完成

3 个答案:

答案 0 :(得分:2)

str与索引一起使用以获取cut_name的第一个值,并在必要时用Series.isin对其进行过滤:

mask = data['cut_name'].isin(['State','District'])
data.loc[mask, 'value_label'] = data['cut_name'].str[0] + data['value_label'].astype(str)

如果只有StateDistrict可能的值:

data['value_label'] = data['cut_name'].str[0] + data['value_label'].astype(str)

为了提高性能,可以使用列表理解功能(效果不错,而且不会丢失任何值):

data['value_label'] = [c[0] + str(v) for c, v in zip(data['cut_name'], data['value_label'])]

如果需要具有已过滤列的新DataFrame:

new_df = data[['value','value_label']]

答案 1 :(得分:2)

是的,肯定有:

df.loc[df['cut_name'].isin(['State', 'District']), 'value_label'] = np.where(df['cut_name'] == 'State', 'S' + df['value_label'], 'D' + df['value_label'])

答案 2 :(得分:1)

如果要使用applylambda

,可以执行以下操作
df = pd.DataFrame([
{"cut_id":1,"cut_label":"v024","cut_name":"State","value_label":"1","value":"andaman and nicobar islands"},
{"cut_id":3,"cut_label":"v024","cut_name":"State","value_label":"3","value":"arunachal pradesh"},
{"cut_id":635,"cut_label":"sdistri","cut_name":"District","value_label":"599","value":"pathanamthitta"},
{"cut_id":636,"cut_label":"sdistri","cut_name":"District","value_label":"600","value":"kollam"},
{"cut_id":637,"cut_label":"sdistri","cut_name":"District","value_label":"601","value":"thiruvananthapuram"}
])

n_df = pd.DataFrame()

n_df['value'] = df['value']
n_df['value_label'] = df.apply(lambda x : x['cut_name'][0] + x['value_label'], axis=1)

n_df.T.to_dict().values()

#Output

[{'value': 'andaman and nicobar islands', 'value_label': 'S1'}, {'value': 'arunachal pradesh', 'value_label': 'S3'}, {'value': 'pathanamthitta', 'value_label': 'D599'}, {'value': 'kollam', 'value_label': 'D600'}, {'value': 'thiruvananthapuram', 'value_label': 'D601'}]