我有一个具有以下结构的时间序列数据帧:
const yourFunc = (arg1, arg2) => {
let split1 = arg1.split(/[:/-]/);
let split2 = arg2.split(/[:/-]/);
let output = {};
for(i=0; i<split1.length; i++){
output[split1[i]] = split2[i]
}
console.log(output);
return output;
}
yourFunc("a:b:c", "1:2:3"); // {a: "1", b: "2", c: "3"}
yourFunc("x/y-z", "123/abc-x"); // {x: "123", y: "abc", z: "x"}
export const Product = styled.View`
background: #fff;
padding: 15px 10px;
border-radius: 5px;
margin: 5px;
flex-direction: row;
`;
export const ProductTitleContainer = styled.View`
font-size: 16px;
margin-left: 5px;
flex-shrink: 1;
`;
export const ProductTitle = styled.Text`
font-size: 16px;
flex-wrap: wrap;
`;
`;
我想在每个组中添加行,直到每个组具有相同数量的行。 (其中,行数=包含最多行的ID)
对于每个新行,我想用0填充Speaker1和Speaker2列,同时使该ID中其他列中的值保持相同。
所以输出应该是:
| ID | second | speaker1 | speaker2 | company | ... |
|----|--------|----------|----------|---------|-----|
| A | 1 | 1 | 1 | name1 | |
| A | 2 | 1 | 1 | name1 | |
| A | 3 | 1 | 1 | name1 | |
| B | 1 | 1 | 1 | name2 | |
| B | 2 | 1 | 1 | name2 | |
| B | 3 | 1 | 1 | name2 | |
| B | 4 | 1 | 1 | name2 | |
| C | 1 | 1 | 1 | name3 | |
| C | 2 | 1 | 1 | name3 | |
到目前为止,我已经尝试了groupby并应用,但是发现它非常慢,因为此数据框中有很多行和列。
*note that speaker1 and speaker2 can be either 0 or 1, I set all to one for clarity here
有没有办法用numpy做到这一点?像
| ID | second | speaker1 | speaker2 | company | ... |
|:--:|:------:|:--------:|:--------:|:-------:|:---:|
| A | 1 | 1 | 1 | name1 | |
| A | 2 | 1 | 1 | name1 | |
| A | 3 | 1 | 1 | name1 | |
| A | 4 | 0 | 0 | name1 | |
| B | 1 | 1 | 1 | name2 | |
| B | 2 | 1 | 1 | name2 | |
| B | 3 | 1 | 1 | name2 | |
| B | 4 | 1 | 1 | name2 | |
| C | 1 | 1 | 1 | name3 | |
| C | 2 | 1 | 1 | name3 | |
| C | 3 | 0 | 0 | name3 | |
| C | 4 | 0 | 0 | name3 | |
非常感谢您的帮助!
答案 0 :(得分:0)
使用大熊猫的另一种方法
ID
和second
的笛卡尔积没有groupby()
没有循环。
df = pd.DataFrame({"ID":["A","A","A","B","B","B","B","C","C"],"second":["1","2","3","1","2","3","4","1","2"],"speaker1":["1","1","1","1","1","1","1","1","1"],"speaker2":["1","1","1","1","1","1","1","1","1"],"company":["name1","name1","name1","name2","name2","name2","name2","name3","name3"]})
df2 = pd.DataFrame({"ID":df["ID"].unique()}).assign(foo=1).merge(\
pd.DataFrame({"second":df["second"].unique()}).assign(foo=1)).drop("foo", 1)\
.merge(df, on=["ID","second"], how="outer")
df2["company"] = df2["company"].fillna(method="ffill")
df2.fillna(0)
输出
ID second speaker1 speaker2 company
0 A 1 1 1 name1
1 A 2 1 1 name1
2 A 3 1 1 name1
3 A 4 0 0 name1
4 B 1 1 1 name2
5 B 2 1 1 name2
6 B 3 1 1 name2
7 B 4 1 1 name2
8 C 1 1 1 name3
9 C 2 1 1 name3
10 C 3 0 0 name3
11 C 4 0 0 name3