Question

我有一个给定的数据帧，如下例：

from twilio.rest import Client
account_sid = 'XXXX'
auth_token = 'XXXX'

client = Client(account_sid, auth_token)

from twilio.twiml.messaging_response import Message, MessagingResponse
from flask import Flask, request, redirect


app = Flask(__name__) #creating a flask app

@app.route("/sms", methods=['GET', 'POST']) #creating an sms route
def sms_reply():
    """Respond to incoming calls with a simple text message."""
    # Start our TwiML response
    resp = MessagingResponse()

# Add a message
resp.message("The Robots are coming! Head for the hills!")
return str(resp)

text = []
messages = client.messages.list()
for record in messages:
    text.append(record.body.encode("utf-8"))

然后我编写了一个函数，该函数应根据特定列中的值和特定值的比较将数据集分为2个数据帧。例如，如果我有0 1 2 3 4 5 6 7 8 0 842517 M 20.57 17.77 132.9 1326 0.08474 0.07864 0.0869 1 84300903 M 19.69 21.25 130 1203 0.1096 0.1599 0.1974 2 84348301 M 11.42 20.38 77.58 386.1 0.1425 0.2839 0.2414 3 843786 M 12.45 15.7 82.57 477.1 0.1278 0.17 0.1578 4 844359 M 18.25 19.98 119.6 1040 0.09463 0.109 0.1127和col_idx = 2，则结果应为：

df1-值以下：

value=18.3

df2-值上方：

            0  1      2      3      4      5        6        7        8    
2    84348301  M  11.42  20.38  77.58  386.1   0.1425   0.2839   0.2414   
3      843786  M  12.45   15.7  82.57  477.1   0.1278     0.17   0.1578   
4      844359  M  18.25  19.98  119.6   1040  0.09463    0.109   0.1127

函数应如下所示：

            0  1      2      3      4      5        6        7        8  
0      842517  M  20.57  17.77  132.9   1326  0.08474  0.07864   0.0869   
1    84300903  M  19.69  21.25    130   1203   0.1096   0.1599   0.1974

有人可以完成我的脚本吗？

Answer 1

below_df = data_set[data_set[col_idx] < value]
above_df = data_set[data_set[col_idx] > value]  # you have to deal with data_set[col_idx] == value though

Answer 2

您可以使用loc：

def split_dataset(data_set, col_idx, value):
    below_df = df.loc[df[col_idx]<=value]
    above_df = df.loc[df[col_idx]>=value]
    return below_df, above_df
df1,df2=split_dataset(df,'2',18.3)

输出：

df1

          0  1      2      3       4       5        6       7       8
2  84348301  M  11.42  20.38   77.58   386.1  0.14250  0.2839  0.2414
3    843786  M  12.45  15.70   82.57   477.1  0.12780  0.1700  0.1578
4    844359  M  18.25  19.98  119.60  1040.0  0.09463  0.1090  0.1127

df2
          0  1      2      3      4       5        6        7       8
0    842517  M  20.57  17.77  132.9  1326.0  0.08474  0.07864  0.0869
1  84300903  M  19.69  21.25  130.0  1203.0  0.10960  0.15990  0.1974

注意：

请注意，在此函数调用中，列的名称为数字。您必须在调用函数之前知道正确的列类型。您可能必须使用string类型。

您还应该定义如果将数据框所除的值（值）包含在数据框的列中会发生什么情况。

熊猫：如何根据特定列上特定值的条件选择数据框中的行

2 个答案: