我有一个我在pd.read_excel
读到的Excel数据:
Block Concentration Name Replicate
1 Array Marker
1 Array Marker
1 100.0 Man5GlcNAc2
1 33.0 Man5GlcNAc2
1 10.0 Man5GlcNAc2
1 100.0 Man6GlcNAc2
1 33.0 Man6GlcNAc2
1 10.0 Man6GlcNAc2
1 100.0 Man7GlcNAc2 D1
1 33.0 Man7GlcNAc2 D1
1 10.0 Man7GlcNAc2 D1
1 100.0 Man7GlcNAc2 D3
1 33.0 Man7GlcNAc2 D3
1 10.0 Man7GlcNAc2 D3
...
...
2 100.0 Man8GlcNAc2 D1D3
2 33.0 Man8GlcNAc2 D1D3
2 10.0 Man8GlcNAc2 D1D3
2 100.0 Man9GlcNAc2
2 33.0 Man9GlcNAc2
2 10.0 Man9GlcNAc2
...
所需的输出是:
Block Concentration Name Replicate
1 Array Marker 1
1 Array Marker 2
1 100.0 Man5GlcNAc2 1
1 33.0 Man5GlcNAc2 2
1 10.0 Man5GlcNAc2 3
1 100.0 Man6GlcNAc2 1
1 33.0 Man6GlcNAc2 2
1 10.0 Man6GlcNAc2 3
1 100.0 Man7GlcNAc2 D1 1
1 33.0 Man7GlcNAc2 D1 2
1 10.0 Man7GlcNAc2 D1 3
1 100.0 Man7GlcNAc2 D3 1
1 33.0 Man7GlcNAc2 D3 2
1 10.0 Man7GlcNAc2 D3 3
...
...
2 100.0 Man8GlcNAc2 D1D3 1
2 33.0 Man8GlcNAc2 D1D3 2
2 10.0 Man8GlcNAc2 D1D3 3
2 100.0 Man9GlcNAc2 1
2 33.0 Man9GlcNAc2 2
2 10.0 Man9GlcNAc2 3
...
我的代码是
data["Replicate"] = data.groupby(["Block", "Name", "Concentration"]).cumcount()+1
我认为这是有道理的,但我得到的输出不是所需的输出,它低于:
Block Concentration Name Replicate
1 Array Marker 1
1 Array Marker 2
1 100.0 Man5GlcNAc2 1
1 33.0 Man5GlcNAc2 1
1 10.0 Man5GlcNAc2 1
1 100.0 Man6GlcNAc2 1
1 33.0 Man6GlcNAc2 1
1 10.0 Man6GlcNAc2 1
1 100.0 Man7GlcNAc2 D1 1
1 33.0 Man7GlcNAc2 D1 1
1 10.0 Man7GlcNAc2 D1 1
1 100.0 Man7GlcNAc2 D3 1
1 33.0 Man7GlcNAc2 D3 1
1 10.0 Man7GlcNAc2 D3 1
...
...
1 100.0 Man8GlcNAc2 D1D3 1
1 33.0 Man8GlcNAc2 D1D3 1
1 10.0 Man8GlcNAc2 D1D3 1
1 100.0 Man9GlcNAc2 1
1 33.0 Man9GlcNAc2 1
1 10.0 Man9GlcNAc2 1
...
1 100.0 Man5GlcNAc2 2
1 33.0 Man5GlcNAc2 2
1 10.0 Man5GlcNAc2 2
....
复制列是' 1'直到后来的行,我不知道它是如何选择分配数字的行。共有3个块名称组合是相同的,所以我需要指定1,2,3' 1,2,3'当我使用数据透视表时,将它们分开以供以后使用。我已经集中精力了#39;列为字符串类型,因此数字应该不是问题。
答案 0 :(得分:0)
如果从组中删除“浓度”,您将获得预期的输出。
data["Replicate"] = data.groupby(["Block", "Name"]).cumcount()+1
>>> data
Block Concentration Name Replicate
0 1 '' Array.Marker 1
1 1 '' Array.Marker 2
2 1 100.0 Man5GlcNAc2 1
3 1 33.0 Man5GlcNAc2 2
4 1 10.0 Man5GlcNAc2 3
5 1 100.0 Man6GlcNAc2 1
6 1 33.0 Man6GlcNAc2 2
7 1 10.0 Man6GlcNAc2 3
8 1 100.0 Man7GlcNAc2D1 1
9 1 33.0 Man7GlcNAc2D1 2
答案 1 :(得分:0)
cumcount()+1
代替功能moving window=3
可以#groupby and set rolling count from column Block
data["Replicate"] = data.groupby(["Block", "Name"])["Block"].transform(pd.rolling_count, window=3)
使用rolling count:
Concentration
格式很奇怪。如果复制数据没有问题,您可以通过将列Name
转换为浮动并从文本的开头和结尾分隔列Block Concentration Name Replicate
1 Array Marker
1 Array Marker
1 100.0 Man5GlcNAc2
1 33.0 Man5GlcNAc2
1 10.0 Man5GlcNAc2
1 100.0 Man6GlcNAc2
1 33.0 Man6GlcNAc2
1 10.0 Man6GlcNAc2
1 100.0 Man7GlcNAc2 D1
1 33.0 Man7GlcNAc2 D1
1 10.0 Man7GlcNAc2 D1
1 100.0 Man7GlcNAc2 D3
1 33.0 Man7GlcNAc2 D3
1 10.0 Man7GlcNAc2 D3
中的空格来修复它。
#convert column Concentration to float
data['Concentration'] = data['Concentration'].astype(float)
#strip first and last whitespaces
data['Name'] = data['Name'].str.strip()
#groupby and set rolling count from column Block
data["Replicate"] = data.groupby(["Block", "Name"])["Block"].transform(pd.rolling_count, window=3)
Block Concentration Name Replicate
0 1 Array Marker 1
1 1 Array Marker 2
2 1 100 Man5GlcNAc2 1
3 1 33 Man5GlcNAc2 2
4 1 10 Man5GlcNAc2 3
5 1 100 Man6GlcNAc2 1
6 1 33 Man6GlcNAc2 2
7 1 10 Man6GlcNAc2 3
8 1 100 Man7GlcNAc2 D1 1
9 1 33 Man7GlcNAc2 D1 2
10 1 10 Man7GlcNAc2 D1 3
11 1 100 Man7GlcNAc2 D3 1
12 1 33 Man7GlcNAc2 D3 2
13 1 10 Man7GlcNAc2 D3 3
fsr