我正在寻找帮助,使用类似np.nansum的函数从现有数据框架创建子数据框。我想将此表转换为非空列总和的矩阵:
dan ste bob
dan 0 2 5
ste 4 0 2
bob 4 1 0
例如,当' dan'不为空(t-2,3,4,6,7)' ste'是2和' bob'是5.当' ste'并不是“丹”的总和。是4.
def nansum_matrix_create(df):
rows = []
for col in list(df.columns.values):
col_sums = df[df[col] != 0].sum()
rows.append(col_sums)
return pd.DataFrame(rows, columns=df.columns, index=df.columns)
有什么想法吗?
提前致谢!
我最终在下面使用了修改版本的matt功能:
import React from 'react'
import { StyleSheet, ScrollView } from 'react-native'
import * as Animatable from 'react-native-animatable'
const E1ScrollView = ({ children, animation, bottomBorder, style }) => {
const { container, E1bottomBorder } = styles
// the key is flexGrow: 1 on the ScrollView (and contentContainerStyle)
// The wrapped <View /> should be flex: 1
return (
<ScrollView
contentContainerStyle={{ flexGrow: 1 }}
scrollEnabled>
<Animatable.View
style={[container, (bottomBorder) ? E1bottomBorder : null, style]}
animation={animation}
iterationCount={1}>
{children}
</Animatable.View>
</ScrollView>
)
}
const styles = StyleSheet.create({
container: {
flex: 1,
backgroundColor: '#F0F0F0',
flexDirection: 'column'
},
E1bottomBorder: {
borderBottomWidth: 5,
borderColor: '#DD0426',
}
})
export { E1ScrollView }
答案 0 :(得分:2)
pd.DataFrame.notnull
获取非空值的位置。pd.DataFrame.dot
来设置交叉表。np.eye
将对角线清零。 df.notnull().T.dot(df.fillna(0)) * (1 - np.eye(df.shape[1]))
dan ste bob
dan 0.0 2.0 5.0
ste 4.0 0.0 2.0
bob 4.0 1.0 0.0
注意:强>
我用它来确保我的值是数字的。
df = df.apply(pd.to_numeric, errors='coerce')
答案 1 :(得分:0)
假设您的数据帧没有大量列,此函数应该执行您想要的操作并且性能相当。我已经使用for
循环实现了这一点,因此可能会有一个更高性能/更优雅的解决方案。
import pandas as pd
# Initialise dataframe
df = {"dan":[pd.np.nan,2,2,1,pd.np.nan,2,1],
"ste":[2,pd.np.nan,1,pd.np.nan,1,1,pd.np.nan],
"bob":[pd.np.nan,1,pd.np.nan,2,2,pd.np.nan,2]}
df = pd.DataFrame(df)[["dan","ste","bob"]]
def matrix_create(df):
rows = []
for col in df.columns:
subvals, index = [], []
for subcol in df.columns:
index.append(subcol)
if subcol == col:
subvals.append(0)
else:
subvals.append(df[~pd.isnull(df[col])][subcol].sum())
rows.append(subvals)
return pd.DataFrame(rows,columns=df.columns,index=index)
matrix_create(df)