我有两个CSV文件:
File1中
id text_feature value
1 feature2 20
1 feature3 5
2 feature2 20
...
文件2
id feature2 feature3
1 1 1
2 1 0
...
根据这些文件,我想得到以下文件(即用values
代替1' s和0')
文件3
id feature2 feature3
1 20 5
2 20 0
...
这是我尝试解决任务的方法,但需要很长时间(我的CSV文件大约有20,000个条目):
import pandas as pd
def find_value(df_data, df_row, column_name):
value = 0
for index, row in df_data.iterrows():
f = row['feature'].replace(' ','')
if row['id'] == df_row['id'] and f == column_name:
value = row['volume']
break
return value
df_data = pd.read_csv("File1.csv")
df_textfeatures = pd.read_csv("File2.csv")
for index, row in df_textfeatures.iterrows():
for column_name, column in df_textfeatures.transpose().iterrows():
row[column_name] = find_value(df_data, row, column_name)
答案 0 :(得分:2)
您可以直接转动dataframe
调用的文件1:
d = file1.pivot_table(index='id',columns='text_feature',values='value')
返回:
text_feature feature2 feature3
id
1 20 5
2 20 NaN
要获得所需内容,您可以使用0:
填充NaN
值
d.fillna(0)
返回:
text_feature feature2 feature3
id
1 20 5
2 20 0
编辑:
然后必须重置索引以将索引设置为列:
d.reset_index()
返回:
text_feature id feature2 feature3
0 1 20 5
1 2 20 0