对于以下数据框:
'use strict'
module.exports = (sequelize, DataTypes) => {
const document = sequelize.define('document', {
id: {
allowNull: false,
primaryKey: true,
type: DataTypes.UUID,
defaultValue: DataTypes.UUIDV4
},
parentId: {
allowNull: true,
type: DataTypes.UUID,
references: {
model: 'documents',
key: 'id'
}
},
lastUpdatedBy: {
allowNull: false,
type: DataTypes.UUID
}
},
{
updatedAt: 'lastUpdatedAt'
})
document.associate = function (models) {
document.belongsTo(models.document, { foreignKey: 'parentId' })
}
return document
}
我要修改import numpy as np
import pandas as pd
df = pd.DataFrame({'chr_key': [1, 1, 1, 2, 2, 3, 4],
'position': [123,124,125,126,127,128,129],
'hit_count': [20,19,18,17,16,15,14]})
df['strand'] = np.nan
列,以便:
strand
我的实际for i in range(0, len(df['position'])):
if df['chr_key'][i] == df['chr_key'][i+1] and df['hit_count'][i] >= df['hit_count'][i+1]:
df['strand'][i] = 'F'
else:
df['strand'][i] = 'R'
大于10万行,因此一个for循环很慢,正如我们可以想象的那样。有没有一种快速的方法来实现这一目标?
我修改了原始数据框。输出将是:
df
因为只有3个df = pd.DataFrame({'chr_key' : [1, 1, 1, 2, 2, 3, 4], 'position' : [123, 124, 125, 126, 127, 128, 129], 'hit_count' : [20, 19, 18, 17, 16, 15, 14], 'strand': ['R', 'R', 'F', 'R', 'F', 'F', 'F']})
,所以当涉及到第三行时,由于它没有i + 1比较行,因此chr_key == 1
的值将默认为strand
< / p>
答案 0 :(得分:1)
我正在使用np.where
和shift
c1=(df.chr_key==df.chr_key.shift(-1))
c2=(df.hit_count>=df.hit_count.shift(-1))
df['strand']=np.where(c1&c2,'F','R')
答案 1 :(得分:1)
您可以尝试以下方法:
import pandas as pd
df = pd.DataFrame({'chr_key' : [1, 1, 1, 2, 2, 3, 4], 'position' : [123, 124, 125, 126, 127, 128, 129], 'hit_count' : [20, 19, 18, 17, 16, 15, 14]})
df['strand'] = 'R'
idx_1 = df.chr_key == df.chr_key.shift(-1)
idx_2 = df.hit_count >= df.hit_count.shift(-1)
df.loc[idx_1 & idx_2, 'strand'] = 'F'
使用loc
或iloc
方法访问熊猫数据框是一种更好的做法:https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html