根据分组数据标记列

时间:2019-07-18 10:05:47

标签: python pandas

我正在尝试创建一个由每个ID的唯一值组成的列(每个ID都有与其关联的许多行),如果该ID的任何行都与已回答的标签相关联,则所有与该ID关联的行应该标为已回答。如果与ID关联的所有行都带有未回答的标记,则所有行都应标记为未回答(当前就是这种情况)

这是我编写的代码:

将numpy导入为np

conds = [file.data__answered_at.isna(),file.data__answered_at.notna()]
choices = ["not answered","answered"]
file['call_status'] = np.select(conds,choices,default=np.nan)

 data__id   call_status       rank
  1            answered        1
  1          not_answered      2
  1            answered        3
  2          not_answered      1
  2             answered       2
  3          not_answered      1
  4            answered        1
  4          not_answered      2
  5          not_answered      1
  5          not_answered      2

在这种情况下,预期的结果将是

   data__id   call_status       rank
  1            answered        1
  1            answered        2
  1            answered        3
  2            answered        1
  2            answered        2
  3          not_answered      1
  4            answered        1
  4            answered        2
  5          not_answered      1
  5          not_answered      2

2 个答案:

答案 0 :(得分:4)

GroupBy.transformGroupBy.any一起使用,以每组至少测试一个import RNFetchBlob from 'rn-fetch-blob' type Props = {}; export default class App extends Component<Props> { constructor(){ super(); this.state = { download : 'not yet' } } componentDidMount(){ this._testDownload(); } _testDownload = () => { RNFetchBlob.fetch('GET', 'https://www.gstatic.com/webp/gallery3/1.png', { Authorization : 'Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJleHAiOjE1NzIwMDY4MDEsInVpZCI6Mjk5LCJ1c2VybmFtZSI6Imd1ZXN0XzM5MjQ4NDUiLCJlbWFpbCI6IiIsInJvbGVzIjpbIlVTRVIiXX0.gQ_Gqehx3tcWYI0C5CGmpaTfT33t_TPCKbuIYYOqVBU', 'Content-Type' : 'octet-stream', // more headers .. }) .then((res) => { let status = res.info().status; console.log('status' , status) if(status == 200) { // the conversion is done in native code let base64Str = res.base64() RNFetchBlob.fs.writeFile(`${RNFetchBlob.fs.dirs.DocumentDir}/app/assets/1.png`, base64Str, 'base64') .then(()=>{ console.log('here check') }).catch(err => console.log('err', err)) } else { // handle other status codes } }) // Something went wrong: .catch((errorMessage, statusCode) => { // error handling }) } render() { return ( <View style={styles.container}> <Text style={styles.welcome}>Welcome to React Native!</Text> <Text style={styles.instructions}>To get started, edit App.js</Text> <Text style={styles.instructions}>{this.state.download}</Text> </View> ); } } 并通过DataFrame.loc设置值:

answered

或通过另一列过滤所有mask = df['call_status'].eq('answered').groupby(df['data__id']).transform('any') 并通过Series.isin测试成员资格:

data__id

mask = df['data__id'].isin(df.loc[df['call_status'].eq('answered'), 'data__id'].unique())

答案 1 :(得分:3)

我们可以在此处使用groupby并检查any行是否等于answered

然后,我们使用np.where有条件地填写answerednot_answered

m = file.groupby('data__id')['call_status'].transform(lambda x: x.eq('answered').any())

file['call_status'] = np.where(m, 'answered', 'not_answered')

输出

  data__id   call_status  rank
0         1      answered     1
1         1      answered     2
2         1      answered     3
3         2      answered     1
4         2      answered     2
5         3  not_answered     1
6         4      answered     1
7         4      answered     2
8         5  not_answered     1
9         5  not_answered     2