如何将csv文件映射到python中的另一个文件?

时间:2019-08-08 12:58:44

标签: python pandas

我有2个csv文件。第一个看起来像这样:

Dim docSource as Word.Document
Dim docTarget as Word.Document
Dim tbl as Word.Table, cel as Word.Cell
Dim rngCell as Word.Range, rngTarget as Word.Range
Dim searchText as String, lenText as Long

Set docSource = Documents.Open("path to document with random numbers")
Set docTarget = Documents.Open("path to document to be searched")

Set tbl = docSource.Tables(1)
Set rngTarget = docTarget.Content
For Each cel in tbl.Range.Cells
  searchText = cel.Range.Text
  lenText = Len(searchText)
  If lenText > 1 Then 'If cell is not "empty"
    searchText = Mid(searchText, 1, lenText - 2) 'remove cell structures

    With rngTarget.Find
      .Replacement.ClearFormatting
      .Text = searchText
      .Replacement.Text = Selection.Characters
      .Forward = True
      .Wrap = wdFindStop
      .Format = False
      .MatchCase = False
      .MatchWholeWord = False
      .MatchWildcards = False
      .MatchSoundsLike = False
      .MatchAllWordForms = False
    End With
  End If
Next

它仅包含学生的ID和他们已经做过的练习。 第二个包含每个ID的ID和等级:

ID , Exersice 
1 , 1.1
1 , 1.2
3 , 1.4
.
.

所以如何从第二个文件映射到第一个文件是这样的:

ID , 1.1 , 1.2 ,1.3 ...
1  , 5   , 9   ,8   ...
3  , 4   , 10  ,6   ...
.
.

3 个答案:

答案 0 :(得分:1)

Merge, join and concatenate

该链接提供了有关如何执行此操作的示例

特定的pd.concat(data, axis=1)应该可以解决问题

答案 1 :(得分:0)

使用DataFrame.set_indexDataFrame.stack创建MultiIndex Series,如有必要,将所有列都转换为浮点数,最后使用DataFrame.join,而无需先转换为浮点数:

s = df2.set_index('ID').rename(columns=float).stack().rename('grade')
df = df1.join(s, on=['ID','Exersice'])
print (df)
   ID  Exersice  grade
0   1       1.1    5.0
1   1       1.2    9.0
2   3       1.4    NaN

另一个类似的解决方案:

df3 = df2.melt('ID', var_name='Exersice', value_name='new')
df3['Exersice'] = df3['Exersice'].astype(float)

df = df1.merge(df3, on=['ID','Exersice'], how='left')
print (df)
   ID  Exersice  new
0   1       1.1  5.0
1   1       1.2  9.0
2   3       1.4  NaN

答案 2 :(得分:0)

一种方法是通过映射第二个表中的值来将grades列创建到第一个数据帧。

此处将第二张表的ID列设置为索引,以简化映射。另外,第二个表的列值是字符串,因此在应用第一个表中的单元格值时,这些值将转换为字符串。

import pandas as pd

df_exercises = pd.read_csv("student_exercises.csv")
df_grades = pd.read_csv("student_grade.csv")

df_grades.set_index("ID", inplace=True)
df_exercises['grades'] = df_exercises.apply(lambda x: df_grades.loc[x.ID, str(x.Exersice)], axis=1)