Question

对于以下数据框：

'use strict'

module.exports = (sequelize, DataTypes) => {
  const document = sequelize.define('document', {
    id: {
      allowNull: false,
      primaryKey: true,
      type: DataTypes.UUID,
      defaultValue: DataTypes.UUIDV4
    },
    parentId: {
      allowNull: true,
      type: DataTypes.UUID,
      references: {
        model: 'documents',
        key: 'id'
      }
    },
    lastUpdatedBy: {
      allowNull: false,
      type: DataTypes.UUID
    }
  },
  {
    updatedAt: 'lastUpdatedAt'
  })
  document.associate = function (models) {
    document.belongsTo(models.document, { foreignKey: 'parentId' })
  }
  return document
}

我要修改import numpy as np import pandas as pd df = pd.DataFrame({'chr_key': [1, 1, 1, 2, 2, 3, 4], 'position': [123,124,125,126,127,128,129], 'hit_count': [20,19,18,17,16,15,14]}) df['strand'] = np.nan列，以便：

strand

我的实际for i in range(0, len(df['position'])): if df['chr_key'][i] == df['chr_key'][i+1] and df['hit_count'][i] >= df['hit_count'][i+1]: df['strand'][i] = 'F' else: df['strand'][i] = 'R'大于10万行，因此一个for循环很慢，正如我们可以想象的那样。有没有一种快速的方法来实现这一目标？

我修改了原始数据框。输出将是：

df

因为只有3个df = pd.DataFrame({'chr_key' : [1, 1, 1, 2, 2, 3, 4], 'position' : [123, 124, 125, 126, 127, 128, 129], 'hit_count' : [20, 19, 18, 17, 16, 15, 14], 'strand': ['R', 'R', 'F', 'R', 'F', 'F', 'F']})，所以当涉及到第三行时，由于它没有i + 1比较行，因此chr_key == 1的值将默认为strand < / p>

Answer 1

我正在使用np.where和shift

c1=(df.chr_key==df.chr_key.shift(-1))
c2=(df.hit_count>=df.hit_count.shift(-1))
df['strand']=np.where(c1&c2,'F','R')

Answer 2

您可以尝试以下方法：

import pandas as pd

df = pd.DataFrame({'chr_key' : [1, 1, 1, 2, 2, 3, 4], 'position' : [123, 124, 125, 126, 127, 128, 129], 'hit_count' : [20, 19, 18, 17, 16, 15, 14]})

df['strand'] = 'R'

idx_1 = df.chr_key == df.chr_key.shift(-1) 
idx_2 = df.hit_count >= df.hit_count.shift(-1)

df.loc[idx_1 & idx_2, 'strand'] = 'F'

使用loc或iloc方法访问熊猫数据框是一种更好的做法：https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html

快速遍历大型数据框中的行以确定列的内容

2 个答案: