Question

一个非常新的Python数据分析人员，以及一位有抱负的数据分析师，我正在尝试从给定的CSV文件中提取数据，并且已经以另一个.json文件中给出的格式给出了它。

我不知道如何开始此程序？

这是我的data.csv文件示例文本：

data.csv

v1,v2,v3,v4,v5,v6,v7,v8,v9,v10,v11,v12,v13
2,1,3,3|6|1|2|5,5,1,1,4,4|1|3|2|5,2,3|5|4|2,1,2
2,2,2,1|6|5|2|3|4,1,5,4,4,4|3|5|2|1,3,3|2|4|5,2,3
1,2,1,3|2|1|5,4,2,4,3,1|2|4|3|5,2,4|3,1,1
2,3,3,6|2|1|3|5|4,5,2,1,1,3|4|5,2,4,4,1

format.json

[
  {
    "type": 1,
    "name": "Gender",
    "options": [
      {
        "code": 1,
        "label": "Male"
      },
      {
        "code": 2,
        "label": "Female"
      }
    ],
    "variable": "v1"
  },
  {
    "type": 1,
    "name": "Age Group",
    "options": [
      {
        "code": 1,
        "label": "13-18"
      },
      {
        "code": 2,
        "label": "19-26"
      },
      {
        "code": 3,
        "label": "27-35"
      }
    ],
    "variable": "v2"
  },
  {
    "type": 1,
    "name": "City",
    "options": [
      {
        "code": 1,
        "label": "Delhi"
      },
      {
        "code": 2,
        "label": "Jaipur"
      },
      {
        "code": 3,
        "label": "Mumbai"
      }
    ],
    "variable": "v3"
  },
  {
    "type": 2,
    "name": "Clothing purchased",
    "options": [
      {
        "code": 1,
        "label": "Jeans"
      },
      {
        "code": 2,
        "label": "Shirt"
      },
      {
        "code": 3,
        "label": "Trouser"
      },
      {
        "code": 4,
        "label": "Sweater"
      },
      {
        "code": 5,
        "label": "Coat"
      },
      {
        "code": 6,
        "label": "Shorts"
      }
    ],
    "variable": "v4"
  },
  {
    "name": "Price Justified",
    "options": [
      {
        "code": 1,
        "label": "Extremely Agree"
      },
      {
        "code": 2,
        "label": "Agree"
      },
      {
        "code": 3,
        "label": "Neither Agree nor disagree"
      },
      {
        "code": 4,
        "label": "Disagree"
      },
      {
        "code": 5,
        "label": "Extremely Disagree"
      }
    ],
    "type": 1,
    "variable": "v5"
  },
  {
    "name": "Good quality",
    "options": [
      {
        "code": 1,
        "label": "Extremely Agree"
      },
      {
        "code": 2,
        "label": "Agree"
      },
      {
        "code": 3,
        "label": "Neither Agree nor disagree"
      },
      {
        "code": 4,
        "label": "Disagree"
      },
      {
        "code": 5,
        "label": "Extremely Disagree"
      }
    ],
    "type": 1,
    "variable": "v6"
  },
  {
    "name": "Occupation",
    "options": [
      {
        "code": 1,
        "label": "Govt. Service"
      },
      {
        "code": 2,
        "label": "Private Service"
      },
      {
        "code": 3,
        "label": "Business"
      },
      {
        "code": 4,
        "label": "Student"
      },
      {
        "code": 5,
        "label": "Unemployed"
      }
    ],
    "type": 1,
    "variable": "v7"
  },
  {
    "name": "Salary Range",
    "options": [
      {
        "code": 1,
        "label": "0-5L"
      },
      {
        "code": 2,
        "label": "5L-10L"
      },
      {
        "code": 3,
        "label": "10L-15L"
      },
      {
        "code": 4,
        "label": "15L-20L"
      },
      {
        "code": 5,
        "label": "20L and more"
      }
    ],
    "type": 1,
    "variable": "v8"
  },
  {
    "name": "If new Product introducted what would you buy",
    "options": [
      {
        "code": 1,
        "label": "Tie"
      },
      {
        "code": 2,
        "label": "Caps"
      },
      {
        "code": 3,
        "label": "Socks"
      },
      {
        "code": 4,
        "label": "Poncho"
      },
      {
        "code": 5,
        "label": "Scarves"
      }
    ],
    "type": 2,
    "variable": "v9"
  },
  {
    "name": "Rate",
    "options": [
      {
        "code": 1,
        "label": "1 Star"
      },
      {
        "code": 2,
        "label": "2 Star"
      },
      {
        "code": 3,
        "label": "3 Star"
      },
      {
        "code": 4,
        "label": "4 Star"
      },
      {
        "code": 5,
        "label": "5 Star"
      }
    ],
    "type": 1,
    "variable": "v10"
  },
  {
    "name": "what you didnt like",
    "options": [
      {
        "code": 1,
        "label": "Staff behaviour"
      },
      {
        "code": 2,
        "label": "Clothing Variety"
      },
      {
        "code": 3,
        "label": "Cleanliness"
      },
      {
        "code": 4,
        "label": "Location"
      },
      {
        "code": 5,
        "label": "Price"
      }
    ],
    "type": 2,
    "variable": "v11"
  },
  {
    "name": "Shopping Experience",
    "options": [
      {
        "code": 1,
        "label": "1"
      },
      {
        "code": 2,
        "label": "2"
      },
      {
        "code": 3,
        "label": "3"
      },
      {
        "code": 4,
        "label": "4"
      },
      {
        "code": 5,
        "label": "5"
      }
    ],
    "type": 1,
    "variable": "v12"
  },
  {
    "name": "Did you avail discount",
    "options": [
      {
        "code": 1,
        "label": "Yes"
      },
      {
        "code": 2,
        "label": "No"
      },
      {
        "code": 3,
        "label": "didn't know"
      }
    ],
    "type": 1,
    "variable": "v13"
  }
]

任何种类的帮助或教程都将不胜感激，而且，作为python开发人员，我将能够理解任何种类的python代码，因此代码答案也将受到赞赏。

请注意：第一行是每个问题所用变量的标题，如前所述，少数问题是多选问题，因此对于它们，响应以管道分隔代码（'|'）的形式捕获在行上方[变量v4，v9和v11是多项选择变量]

Answer 1

读取每种数据类型。然后，您可以遍历每列，将其与json / dictionary中的相应索引值进行匹配，以创建映射字典。然后，使用该映射字典将值替换为关联的标签。

import pandas as pd
import json


def replace_all(text, dic):
    for i, j in dic.items():
        text = text.replace(i, j)
    return text


data = pd.read_csv('C:/data.csv')

with open('C:/format.json') as json_file:  
    data_format = json.load(json_file)


cols = list(data.columns)

for col in cols:

    data[col] = data[col].astype(str)
    # Get index of the dictionary where the label value matches the column
    idx = next((index for (index, d) in enumerate(data_format) if d["variable"] == col), None)
    temp_dict = data_format[idx]

    map_dict = {}
    for each in temp_dict['options']:
        map_dict[str(each['code'])] = each['label']

    data[col]=data[col].apply(lambda x: replace_all(x, map_dict))

输出：

print (data.to_string())
       v1         v2      v3                                       v4                  v5                  v6             v7                  v8                             v9     v10                                          v11 v12          v13
0  Female  127-35-18  Mumbai          Trouser|Shorts|Jeans|Shirt|Coat  Extremely Disagree     Extremely Agree  Govt. Service  120L and moreL-20L  Poncho|Tie|Socks|Caps|Scarves  2 Star  Cleanliness|Price|Location|Clothing Variety   1           No
1  Female      19-26  Jaipur  Jeans|Shorts|Coat|Shirt|Trouser|Sweater     Extremely Agree  Extremely Disagree        Student  120L and moreL-20L  Poncho|Socks|Scarves|Caps|Tie  3 Star  Cleanliness|Clothing Variety|Location|Price   2  didn't know
2    Male      19-26   Delhi                 Trouser|Shirt|Jeans|Coat            Disagree               Agree        Student  10L-120L and moreL  Tie|Caps|Poncho|Socks|Scarves  2 Star                         Location|Cleanliness   1          Yes
3  Female      27-35  Mumbai  Shorts|Shirt|Jeans|Trouser|Coat|Sweater  Extremely Disagree               Agree  Govt. Service     0-20L and moreL           Socks|Poncho|Scarves  2 Star                                     Location   4          Yes

Answer 2

https://pandas.pydata.org/pandas-docs/stable/getting_started/10min.html有关熊猫的基础知识，这是一个功能强大的数据分析库，

https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html了解IO管理，

仅供参考，您可以使用方法pandas.DataFrame和read_csv（注意东方参数）将Everythig加载到read_json，然后转换为所需的格式to_csv, to_json

如何从.csv文件中以python中的.json文件中给定格式提取数据？

2 个答案: