我有这种格式的DataFrame
Col1|Col2
A|Agriculture, forestry and fishing
1|Crop and animal production, hunting and related service activities
11|Growing of non-perennial crops
12|Growing of perennial crops
14|Animal production
C|Manufacturing
11|Manufacture of beverages
110|Manufacture of beverages
12|Manufacture of tobacco products
120|Manufacture of tobacco products
14|Manufacture of wearing apparel
141|Manufacture of wearing apparel, except fur apparel
A是项|在A下为1子项,在A下为11,即sub_sub_item。 问题在“ C”下有11个子项时出现
现在我已经完成了以下工作:
Col0_list = df['Col0'].values.tolist()
Col1_list = df['Col1'].values.tolist()
#Defining Empty lists
item = []
sub_item = []
sub_sub = []
#looping through the
for i in range(len(Col0_list)):
if str(Col0_list[i]).isalpha():
item.append(Col1_list[i])
sub_item.append(np.nan)
sub_sub.append(np.nan)
elif Col0_list[i] < 10 and len(str(Col0_list[i]))==1:
item.append(np.nan)
sub_item.append(Col1_list[i])
sub_sub.append(np.nan)
elif icode_list[i] > 10 and len(str(Col0_list[i]))== 2:
#THIS IS WHERE IT FAILS SINCE '11' is both sub_item and sub_sub
我希望将其转换为以下格式
Item|SubItem|Sub-SubItem
Agriculture, forestry and fishing|Crop and animal production, hunting and related service activities|Growing of non-perennial crops
Agriculture, forestry and fishing|Crop and animal production, hunting and related service activities|Growing of perennial crops
Agriculture, forestry and fishing|Crop and animal production, hunting and related service activities|Animal production
Manufacturing|Manufacture of beverages|Manufacture of beverages
Manufacturing|Manufacture of tobacco products|Manufacture of tobacco products
Manufacturing|Manufacture of wearing apparel |Manufacture of wearing apparel, except fur apparel
答案 0 :(得分:0)
使用此方法:
data = [['tom', 10,'M'], ['nick', 15,'M'], ['juli', 14,'F']]
df = pd.DataFrame(data, columns=['Name', 'Age','Gender'])
json_records = df.to_dict('records')
req_json = {}
male_list = []
female_list = []
for item in json_records:
if item['Gender'] == 'M':
male_list.append(item['Name'])
if item['Gender'] == 'F':
female_list.append(item['Name'])
req_json['males'] = male_list
req_json['females'] = female_list
print(req_json)
答案 1 :(得分:0)
我无法想象一种很好的矢量化方式,所以我将循环遍历Col1数据以发现该行是Item,SubItem还是SubSubItem。我会用它来构建结果数据框:
typ=np.zeros(len(df))
for i, key in enumerate(df['Col1']):
if re.match('[A-Z]+', key, re.I):
prev = key
elif key.startswith(prev):
typ[i] = 2
else:
typ[i] = 1
prev = key
resul = pd.DataFrame(index = df.index, columns=['Item', 'SubItem', 'SubSubItem'])
for i in range(3):
resul.iloc[:, i] = df.loc[typ == i, 'Col2']
它给出:
Item SubItem SubSubItem
0 Agriculture, forestry and fishing NaN NaN
1 NaN Crop and animal production, hunting and relate... NaN
2 NaN NaN Growing of non-perennial crops
3 NaN NaN Growing of perennial crops
4 NaN NaN Animal production
5 Manufacturing NaN NaN
6 NaN Manufacture of beverages NaN
7 NaN NaN Manufacture of beverages
8 NaN Manufacture of tobacco products NaN
9 NaN NaN Manufacture of tobacco products
10 NaN Manufacture of wearing apparel NaN
11 NaN NaN Manufacture of wearing apparel, except fur app...
我们只需填写NaN值并过滤相关行
resul = resul.ffill()[typ == 2].reset_index(drop=True)
获得:
Item SubItem SubSubItem
0 Agriculture, forestry and fishing Crop and animal production, hunting and relate... Growing of non-perennial crops
1 Agriculture, forestry and fishing Crop and animal production, hunting and relate... Growing of perennial crops
2 Agriculture, forestry and fishing Crop and animal production, hunting and relate... Animal production
3 Manufacturing Manufacture of beverages Manufacture of beverages
4 Manufacturing Manufacture of tobacco products Manufacture of tobacco products
5 Manufacturing Manufacture of wearing apparel Manufacture of wearing apparel, except fur app...
答案 2 :(得分:0)
虽然有点复杂,但是下面的代码片段可以完成工作。
##### Fetching Col1 indices with String value
string_inndices=[]
for idx,col in enumerate(df['Col1']):
try:
int(df.iloc[idx,0])
#print('Integer')
except:
#print('String')
string_inndices.append(idx)
integer_lengths=[]
for i in range(len(string_inndices)):
try:
k=string_inndices[i+1]
integer_lengths.extend(list(map(lambda x:len(str(x)),df.iloc[string_inndices[i]:string_inndices[i+1],0])))
first_length=integer_lengths[string_inndices[i]+1]
first_index=string_inndices[i]+1
Rows=[]
for item in range(string_inndices[i]+1,string_inndices[i+1]):
if integer_lengths[item]>first_length:
row = [df.iloc[string_inndices[i],1],df.iloc[first_index,1],df.iloc[item,1]]
Rows.append(row)
elif integer_lengths[item]==first_length:
first_index=item
#print(Rows)
except:
integer_lengths.extend(list(map(lambda x:len(str(x)),df.iloc[string_inndices[i]:,0])))
first_length=integer_lengths[string_inndices[i]+1]
first_index=string_inndices[i]+1
for item in range(string_inndices[i]+1,len(df)):
#print(df.iloc[item,1])
if integer_lengths[item]>first_length:
row = [df.iloc[string_inndices[i],1],df.iloc[first_index,1],df.iloc[item,1]]
Rows.append(row)
elif integer_lengths[item]==first_length:
#print(first_length)
first_index=item
#print(Rows)
df_new = pd.DataFrame(data=Rows,columns=['Item','SubItem','Sub-SubItem'])
输出表如下
Item SubItem Sub-SubItem
0 Agriculture, forestry and fishing Crop and animal production, hunting and relate... Growing of non-perennial crops
1 Agriculture, forestry and fishing Crop and animal production, hunting and relate... Growing of perennial crops
2 Agriculture, forestry and fishing Crop and animal production, hunting and relate... Animal production
3 Manufacturing Manufacture of beverages Manufacture of beverages
4 Manufacturing Manufacture of tobacco products Manufacture of tobacco products
5 Manufacturing Manufacture of wearing apparel Manufacture of wearing apparel, except fur app...