这是昨天的一个跟进问题。我有一个由csv文件创建的数据框,我试图比较当前和下一个值。如果他们是相同的,我做一件事,否则,我做另一件事。我正在达到一个超出范围的问题,并希望我能找到一个解决方法。
CSV:
date fruit quantity
4/5/2014 13:34 Apples 73
4/5/2014 3:41 Cherries 85
4/6/2014 12:46 Pears 14
4/8/2014 8:59 Oranges 52
4/10/2014 2:07 Apples 152
4/10/2014 18:10 Bananas 23
4/10/2014 2:40 Strawberries 98
预期输出CSV(备份CSV):
date fruit quantity fruitid
4/5/2014 13:34 Apples 73 fruit0
4/5/2014 3:41 Cherries 85 fruit1
4/6/2014 12:46 Pears 14 fruit2
4/8/2014 8:59 Oranges 52 fruit3
4/10/2014 2:07 Apples 152 fruit0
4/10/2014 18:10 Bananas 23 fruit4
4/10/2014 2:40 Strawberries 98 fruit5
最终CSV:
date fruitid quantity
4/5/2014 13:34 fruit0 73
4/5/2014 3:41 fruit1 85
4/6/2014 12:46 fruit2 14
4/8/2014 8:59 fruit3 52
4/10/2014 2:07 fruit0 152
4/10/2014 18:10 fruit4 23
4/10/2014 2:40 fruit5 98
代码:
import pandas as pd
import numpy
df = pd.read_csv('example2.csv', header=0, dtype='unicode')
df_count = df['fruit'].value_counts()
df.sort_values(['fruit'], ascending=True, inplace=True) #sorting the column
#fruit
df.reset_index(drop=True, inplace=True)
#print(df)
x = 0 #starting my counter values or position in the column
#old_fruit = df.fruit[x]
#new_fruit = df.fruit[x+1]
df.loc[:,'NewCol'] = 0 # to create the new column
print(df)
for x in range(0, len(df)):
old_fruit = df.fruit[x] #Starting fruit
new_fruit = old_fruit[x+1] #next fruit to compare with
if old_fruit == new_fruit:
#print(x)
#print(old_fruit, new_fruit)
df.NewCol[x] = 'fruit' + str(x) #if they are the same, put
#fruit[x] or fruit0 in the current row
else:
print("Not the Same")
#print(x)
#print(old_fruit, new_fruit)
df.NewCol[x+1] = 'fruit' +str(x+1) #if they are the same,
#put fruit[x+1] or fruit1 in the current row
print(df)
答案 0 :(得分:4)
新答案
使用factorize
df.assign(
NewCol=np.core.defchararray.add('Fruit', df.fruit.factorize()[0].astype(str))
)
date fruit quantity NewCol
0 4/5/2014 13:34 Apples 73 Fruit0
1 4/5/2014 3:41 Cherries 85 Fruit1
2 4/6/2014 12:46 Pears 14 Fruit2
3 4/8/2014 8:59 Oranges 52 Fruit3
4 4/10/2014 2:07 Apples 152 Fruit0
5 4/10/2014 18:10 Bananas 23 Fruit4
6 4/10/2014 2:40 Strawberries 98 Fruit5
不是一行,而是更好
f, u = pd.factorize(df.fruit.values)
n = np.core.defchararray.add('Fruit', f.astype(str))
df.assign(NewCol=n)
date fruit quantity NewCol
0 4/5/2014 13:34 Apples 73 Fruit0
1 4/5/2014 3:41 Cherries 85 Fruit1
2 4/6/2014 12:46 Pears 14 Fruit2
3 4/8/2014 8:59 Oranges 52 Fruit3
4 4/10/2014 2:07 Apples 152 Fruit0
5 4/10/2014 18:10 Bananas 23 Fruit4
6 4/10/2014 2:40 Strawberries 98 Fruit5
相同的答案,但更新df
f, u = pd.factorize(df.fruit.values)
n = np.core.defchararray.add('Fruit', f.astype(str))
df = df.assign(NewCol=n)
# Equivalent to
# df['NewCol'] = n
df
date fruit quantity NewCol
0 4/5/2014 13:34 Apples 73 Fruit0
1 4/5/2014 3:41 Cherries 85 Fruit1
2 4/6/2014 12:46 Pears 14 Fruit2
3 4/8/2014 8:59 Oranges 52 Fruit3
4 4/10/2014 2:07 Apples 152 Fruit0
5 4/10/2014 18:10 Bananas 23 Fruit4
6 4/10/2014 2:40 Strawberries 98 Fruit5
旧答案
@SeaMonkey指出了看到错误的原因。
然而,我猜你在做什么
我将cumcount
添加到fruit
df.assign(NewCol=df.fruit + df.groupby('fruit').cumcount().astype(str))
date fruit quantity NewCol
0 4/5/2014 13:34 Apples 73 Apples0
1 4/5/2014 3:41 Cherries 85 Cherries0
2 4/6/2014 12:46 Pears 14 Pears0
3 4/8/2014 8:59 Oranges 52 Oranges0
4 4/10/2014 2:07 Apples 152 Apples1
5 4/10/2014 18:10 Bananas 23 Bananas0
6 4/10/2014 2:40 Strawberries 98 Strawberries0
答案 1 :(得分:2)
我认为你的for循环是一个索引到远,
尝试:
for x in range(0, len(df)-1):
代替
修改强> 这是有意义的:
new_fruit = old_fruit[x+1]
没有给出预期的结果,old_fruit不是列表而是字符串。我想你想要的是:
new_fruit = df.fruit[x+1]
修改(2):
你应该补充:
df.NewCol[x+1] = 'fruit' + str(x)
我的工作脚本是:
import pandas as pd
import numpy
df = pd.read_csv('data.csv', header=0, dtype='unicode')
df_count = df['fruit'].value_counts()
df.sort_values(['fruit'], ascending=True, inplace=True) #sorting the column
#fruit
df.reset_index(drop=True, inplace=True)
#print(df)
x = 0 #starting my counter values or position in the column
#old_fruit = df.fruit[x]
#new_fruit = df.fruit[x+1]
df.loc[:,'NewCol'] = 0 # to create the new column
print(df)
for x in range(0, len(df)-1):
old_fruit = df.fruit[x] #Starting fruit
new_fruit = df.fruit[x+1] #next fruit to compare with
if old_fruit == new_fruit:
#print(x)
#print(old_fruit, new_fruit)
df.NewCol[x] = 'fruit' + str(x)
df.NewCol[x+1] = 'fruit' + str(x)#if they are the same, put
#fruit[x] or fruit0 in the current row
else:
print("Not the Same")
#print(x)
#print(old_fruit, new_fruit)
df.NewCol[x+1] = 'fruit' +str(x+1) #if they are the same,
#put fruit[x+1] or fruit1 in the current row
print(df)