Question

我在执行基本的Pandas操作时试图简化pandas和python语法。

我有4列：

A_ID
a_score
B_ID
b_score

我根据以下内容创建了一个名为 doc_type 的新标签：

a＆gt; = b， doc_type ：a
b＆gt; a， doc_type ：b

我在Pandas中如何计算存在但是b没有挣扎，在这种情况下，则需要成为标签。现在它返回else语句或b。我需要创建2个额外的比较，这在大规模上可能是有效的，因为我已经比较过之前的数据。看看如何改进它。

df = pd.DataFrame({
        'a_id': ['A', 'B', 'C', 'D', '', 'F', 'G'],
        'a_score': [1, 2, 3, 4, '', 6, 7],
        'b_id': ['a', 'b', 'c', 'd', 'e', 'f', ''],
        'b_score': [0.1, 0.2, 3.1, 4.1, 5, 5.99, None],

    })
    print df
    # Replace empty string with NaN
    m_score = r['a_score'] >= r['b_score']
    m_doc = (r['a_id'].isnull() & r['b_id'].isnull())
    df = df.apply(lambda x: x.str.strip() if isinstance(x, str) else x).replace('', np.nan)
    # Calculate higher score
    df['doc_id'] = df.apply(lambda df: df['a_id'] if df['a_score'] >= df['b_score'] else df['b_id'], axis=1)
    # Select type based on higher score
    r['doc_type'] = numpy.where(m_score, 'a',
                          numpy.where(m_doc, numpy.nan, 'b'))      

    # Additional lines looking for improvement:
    df['doc_type'].loc[(df['a_id'].isnull() & df['b_id'].notnull())] = 'b'
    df['doc_type'].loc[(df['a_id'].notnull() & df['b_id'].isnull())] = 'a'
    print df

Answer 1

使用JSONArray，假设您的逻辑是：

两者都存在，doc_type将是得分较高的那个;
一个缺失，doc_type将是一个非null;
两者都缺失，doc_type将为null;

在最后一行添加了额外的边缘案例：

JSONObject

Answer 2

使用自定义函数的pandas中的apply方法，尝试使用数据框：

import pandas as pd
import numpy as np

df = pd.DataFrame({
        'a_id': ['A', 'B', 'C', 'D', '', 'F', 'G'],
        'a_score': [1, 2, 3, 4, '', 6, 7],
        'b_id': ['a', 'b', 'c', 'd', 'e', 'f', ''],
        'b_score': [0.1, 0.2, 3.1, 4.1, 5, 5.99, None],

    })

df = df.replace('',np.NaN)

def func(row):
    if np.isnan(row.a_score) and np.isnan(row.b_score):
        return np.NaN
    elif np.isnan(row.b_score) and not(np.isnan(row.a_score)):
        return 'a'
    elif not(np.isnan(row.b_score)) and np.isnan(row.a_score):
        return 'a'
    elif row.a_score>=row.b_score:
        return 'a'
    elif row.b_score>row.a_score:
        return 'b'

df['doc_type'] = df.apply(func,axis=1)

您可以根据需要使功能变得复杂，并包含任意数量的比较，并在以后根据需要添加更多条件。

Answer 3

我不确定我是否完全理解所有条件，或者这是否有任何特殊的边缘情况，但我认为您可以在列上执行@echo off PUSHD "H:\users" FOR /F "delims=" %%G IN ('dir /ad /b /s ToBeMoved') DO ( pushd "%%~G" >nul 2>nul dir /a-d/b/s && (echo Files exist in %%~G) || (echo No files found in %%~G) popd ) popd pause并交换＆＃39; a＆＃39;的值。或者＆＃39; b＆＃39;当你完成时：

No files found in H:\users\johndoe\ToBeMoved
No files found in H:\users\maryjane\ToBeMoved
Files exist in H:\users\UserWithFiles\ToBeMoved
Press any key to continue . . .

熊猫比较

3 个答案: