根据汇总类别创建新列2

时间:2018-07-31 12:47:26

标签: python pandas

package com.richardszkcs.injectjsintowebview

import android.net.Uri
import android.support.v7.app.AppCompatActivity
import android.os.Bundle
import android.webkit.JavascriptInterface
import kotlinx.android.synthetic.main.activity_main.*
import android.webkit.WebView
import android.webkit.WebViewClient
import android.widget.Toast

class MainActivity : AppCompatActivity() {
    override fun onCreate(savedInstanceState: Bundle?) {
        super.onCreate(savedInstanceState)
        setContentView(R.layout.activity_main)
        sendButton.setOnClickListener { loadWebpage() }
    }

    @Throws(UnsupportedOperationException::class)
    fun buildUri(authority: String) : Uri {
        val builder = Uri.Builder()
        builder.scheme("https")
                .authority(authority)
        return builder.build()
    }

    @JavascriptInterface
    fun reCaptchaCallbackInAndroid(token: String) {
        val tok = token.substring(0, token.length / 2) + "..."
        Toast.makeText(this.applicationContext, tok, Toast.LENGTH_LONG).show()
    }

    fun loadWebpage() {
        webView.getSettings().setJavaScriptEnabled(true)
        webView.addJavascriptInterface(this, "android")
        webView.getSettings().setBuiltInZoomControls(false)
        webView.loadUrl("https://richardszkcs.github.io/recaptcha-test/")

        webView.webViewClient = object : WebViewClient() {
            override fun onPageFinished(view: WebView, url: String) {
                super.onPageFinished(view, url)
                webView.loadUrl("""
                    javascript:(function() {
                        window.onCaptchaSuccess = function(token) {
                            android.reCaptchaCallbackInAndroid(token);
                        }
                    })()
                """.trimIndent())
            }
        }
    }
}

我有上表。我想汇总每个ID的每一列。例如,我需要计算每个window.onCaptchaSuccess的有效和关闭信用的数量,然后使用计数的值为active_credits和closed_credits创建一列。与+-------+------------+---------------+-----------------+ | INDEX | SK_ID_CURR | CREDIT_ACTIVE | CREDIT_TYPE | +-------+------------+---------------+-----------------+ | 0 | 215354 | Closed | Consumer credit | +-------+------------+---------------+-----------------+ | 1 | 215354 | Active | Credit card | +-------+------------+---------------+----------------- | 2 | 215354 | Active | Consumer credit | +-------+------------+---------------+-----------------+ | 3 | 215354 | Active | Credit card | +-------+------------+---------------+-----------------+ | 4 | 215354 | Active | Consumer credit | +-------+------------+---------------+-----------------+ | 5 | 215354 | Active | Credit card | +-------+------------+---------------+-----------------+ | 6 | 215354 | Active | Consumer credit | +-------+------------+---------------+-----------------+ | 7 | 162297 | Closed | Consumer credit | +-------+------------+---------------+-----------------+ | 8 | 162297 | Closed | Consumer credit | +-------+------------+---------------+-----------------+ | 9 | 162297 | Active | Credit card | +-------+------------+---------------+-----------------+ | 10 | 162297 | Active | Credit card | +-------+------------+---------------+-----------------+ | 11 | 162297 | Closed | Consumer credit | +-------+------------+---------------+-----------------+ | 12 | 162297 | Active | Mortgage | +-------+------------+---------------+-----------------+ | 13 | 402440 | Active | Consumer credit | +-------+------------+---------------+-----------------+ | 14 | 238881 | Closed | Credit card | +-------+------------+---------------+-----------------+ 相同。

喜欢:

SK_ID_CURR

4 个答案:

答案 0 :(得分:4)

对于此数据框:

d={'SK_ID_CURR':[215354, 215354, 215354, 215354, 215354, 215354, 215354, 162297, 162297, 162297, 162297, 162297, 162297,402440 ,238881],
   'CREDIT_ACTIVE':['Closed', 'Active', 'Active', 'Active', 'Active', 'Active', 'Active', 'Closed', 'Closed', 'Active', 'Active', 'Closed', 'Active', 'Active', 'Closed' ],
   'CREDIT_TYPE':['Consumer credit', 'Credit card', 'Consumer credit', 'Credit card', 'Consumer credit', 'Credit card', 'Consumer credit', 'Consumer credit', 'Consumer credit', 'Credit card', 'Credit card', 'Consumer credit',                      'Mortgage', 'Consumer credit', 'Credit card', ]}
df=pd.DataFrame(d)

print(df)

输出:

    SK_ID_CURR CREDIT_ACTIVE      CREDIT_TYPE
0       215354        Closed  Consumer credit
1       215354        Active      Credit card
2       215354        Active  Consumer credit
3       215354        Active      Credit card
4       215354        Active  Consumer credit
5       215354        Active      Credit card
6       215354        Active  Consumer credit
7       162297        Closed  Consumer credit
8       162297        Closed  Consumer credit
9       162297        Active      Credit card
10      162297        Active      Credit card
11      162297        Closed  Consumer credit
12      162297        Active         Mortgage
13      402440        Active  Consumer credit
14      238881        Closed      Credit card

您可以尝试以下操作:

aggregations = {
        'CREDIT_ACTIVE': { # work on this column, 
            'CREDIT_ACTIVE': lambda x: list(x).count('Active'),
            'CREDIT_CLOSED': lambda x: list(x).count('Closed') 
        },
        'CREDIT_TYPE': { # work on this column, 
            'CONSUMER_CREDIT ': lambda x: list(x).count('Consumer credit'),
            'CREDIT_CARD': lambda x: list(x).count('Credit card') 
        }}
temp=df.groupby('SK_ID_CURR').agg(aggregations).reset_index()
temp.columns = [e[1] for e in temp.columns.tolist()] 

print(temp)

输出:

           CREDIT_ACTIVE  CREDIT_CLOSED  CONSUMER_CREDIT   CREDIT_CARD
0  162297              3              3                 3            2
1  215354              6              1                 4            3
2  238881              0              1                 0            1
3  402440              1              0                 1            0

答案 1 :(得分:1)

另一种方式,也许有些乏味,但可能会引入一些不同的东西。

creditClosed = df[df.CREDIT_ACTIVE == 'Closed']
creditOpened = df[df.CREDIT_ACTIVE == 'Active']
creditTypeCo = df[df.CREDIT_TYPE == 'Credit card']
creditTypeCr = df[df.CREDIT_TYPE == 'Consumer credit']

a = creditClosed.groupby(['SK_ID_CURR']).agg({'CREDIT_ACTIVE':'count'}).reset_index()
b = creditOpened.groupby(['SK_ID_CURR']).agg({'CREDIT_ACTIVE':'count'}).reset_index()
c = creditTypeCo.groupby(['SK_ID_CURR']).agg({'CREDIT_TYPE':'count'}).reset_index()
d = creditTypeCr.groupby(['SK_ID_CURR']).agg({'CREDIT_TYPE':'count'}).reset_index()

ab = pd.merge(a, b, how = 'outer', on = 'SK_ID_CURR')
abc = pd.merge(ab, c, how = 'outer', on = 'SK_ID_CURR')
final = pd.merge(abc, d, how = 'outer', on = 'SK_ID_CURR')

final.rename(columns = {'CREDIT_ACTIVE_x': 'CREDIT_CLOSED', 'CREDIT_ACTIVE_y': 'CREDIT_ACTIVE', 'CREDIT_TYPE_x': 'CREDIT_CARD', 'CREDIT_TYPE_y': 'CONSUMER_CREDIT'}, inplace = True)
final.fillna(0)

输出:

           CREDIT_ACTIVE  CREDIT_CLOSED  CONSUMER_CREDIT   CREDIT_CARD
0  162297              3              3                 3            2
1  215354              6              1                 4            3
2  238881              0              1                 0            1
3  402440              1              0                 1            0

答案 2 :(得分:0)

您可以使用>$null生成伪列,例如:dummies of the dataframe

使用“ SK_ID_CURR”列将其连接起来,然后可以按“ SK_ID_CURR”进行分组。之后,使用pd.get_dummies(df.drop(columns=['SK_ID_CURR']))按总和汇总数据。 最后,这是有意义地重命名列的问题。

使用pandas在python中进行示例代码:

agg([sum])

答案 3 :(得分:0)

构造一个帮助器列后,您可以加入几个pd.crosstab结果。

来自@AllaTarighati的数据。

df['TYPE'] = np.where(df['CREDIT_TYPE'].str.contains('credit', case=False, na=False),
                      'Credit', 'Mortgage')

cross1 = pd.crosstab(df['SK_ID_CURR'], df['TYPE'] + '_' + df['CREDIT_ACTIVE'])
cross2 = pd.crosstab(df['SK_ID_CURR'], df['CREDIT_TYPE'])
res = cross1.join(cross2)

结果

print(res)

            Credit_Active  Credit_Closed  Mortgage_Active  Consumer credit  \
SK_ID_CURR                                                                   
162297                  2              3                1                3   
215354                  6              1                0                4   
238881                  0              1                0                0   
402440                  1              0                0                1   

            Credit card  Mortgage  
SK_ID_CURR                         
162297                2         1  
215354                3         0  
238881                1         0  
402440                0         0