一个非常新的Python数据分析人员,以及一位有抱负的数据分析师,我正在尝试从给定的CSV文件中提取数据,并且已经以另一个.json文件中给出的格式给出了它。
我不知道如何开始此程序?
这是我的data.csv文件示例文本:
data.csv
v1,v2,v3,v4,v5,v6,v7,v8,v9,v10,v11,v12,v13
2,1,3,3|6|1|2|5,5,1,1,4,4|1|3|2|5,2,3|5|4|2,1,2
2,2,2,1|6|5|2|3|4,1,5,4,4,4|3|5|2|1,3,3|2|4|5,2,3
1,2,1,3|2|1|5,4,2,4,3,1|2|4|3|5,2,4|3,1,1
2,3,3,6|2|1|3|5|4,5,2,1,1,3|4|5,2,4,4,1
format.json
[
{
"type": 1,
"name": "Gender",
"options": [
{
"code": 1,
"label": "Male"
},
{
"code": 2,
"label": "Female"
}
],
"variable": "v1"
},
{
"type": 1,
"name": "Age Group",
"options": [
{
"code": 1,
"label": "13-18"
},
{
"code": 2,
"label": "19-26"
},
{
"code": 3,
"label": "27-35"
}
],
"variable": "v2"
},
{
"type": 1,
"name": "City",
"options": [
{
"code": 1,
"label": "Delhi"
},
{
"code": 2,
"label": "Jaipur"
},
{
"code": 3,
"label": "Mumbai"
}
],
"variable": "v3"
},
{
"type": 2,
"name": "Clothing purchased",
"options": [
{
"code": 1,
"label": "Jeans"
},
{
"code": 2,
"label": "Shirt"
},
{
"code": 3,
"label": "Trouser"
},
{
"code": 4,
"label": "Sweater"
},
{
"code": 5,
"label": "Coat"
},
{
"code": 6,
"label": "Shorts"
}
],
"variable": "v4"
},
{
"name": "Price Justified",
"options": [
{
"code": 1,
"label": "Extremely Agree"
},
{
"code": 2,
"label": "Agree"
},
{
"code": 3,
"label": "Neither Agree nor disagree"
},
{
"code": 4,
"label": "Disagree"
},
{
"code": 5,
"label": "Extremely Disagree"
}
],
"type": 1,
"variable": "v5"
},
{
"name": "Good quality",
"options": [
{
"code": 1,
"label": "Extremely Agree"
},
{
"code": 2,
"label": "Agree"
},
{
"code": 3,
"label": "Neither Agree nor disagree"
},
{
"code": 4,
"label": "Disagree"
},
{
"code": 5,
"label": "Extremely Disagree"
}
],
"type": 1,
"variable": "v6"
},
{
"name": "Occupation",
"options": [
{
"code": 1,
"label": "Govt. Service"
},
{
"code": 2,
"label": "Private Service"
},
{
"code": 3,
"label": "Business"
},
{
"code": 4,
"label": "Student"
},
{
"code": 5,
"label": "Unemployed"
}
],
"type": 1,
"variable": "v7"
},
{
"name": "Salary Range",
"options": [
{
"code": 1,
"label": "0-5L"
},
{
"code": 2,
"label": "5L-10L"
},
{
"code": 3,
"label": "10L-15L"
},
{
"code": 4,
"label": "15L-20L"
},
{
"code": 5,
"label": "20L and more"
}
],
"type": 1,
"variable": "v8"
},
{
"name": "If new Product introducted what would you buy",
"options": [
{
"code": 1,
"label": "Tie"
},
{
"code": 2,
"label": "Caps"
},
{
"code": 3,
"label": "Socks"
},
{
"code": 4,
"label": "Poncho"
},
{
"code": 5,
"label": "Scarves"
}
],
"type": 2,
"variable": "v9"
},
{
"name": "Rate",
"options": [
{
"code": 1,
"label": "1 Star"
},
{
"code": 2,
"label": "2 Star"
},
{
"code": 3,
"label": "3 Star"
},
{
"code": 4,
"label": "4 Star"
},
{
"code": 5,
"label": "5 Star"
}
],
"type": 1,
"variable": "v10"
},
{
"name": "what you didnt like",
"options": [
{
"code": 1,
"label": "Staff behaviour"
},
{
"code": 2,
"label": "Clothing Variety"
},
{
"code": 3,
"label": "Cleanliness"
},
{
"code": 4,
"label": "Location"
},
{
"code": 5,
"label": "Price"
}
],
"type": 2,
"variable": "v11"
},
{
"name": "Shopping Experience",
"options": [
{
"code": 1,
"label": "1"
},
{
"code": 2,
"label": "2"
},
{
"code": 3,
"label": "3"
},
{
"code": 4,
"label": "4"
},
{
"code": 5,
"label": "5"
}
],
"type": 1,
"variable": "v12"
},
{
"name": "Did you avail discount",
"options": [
{
"code": 1,
"label": "Yes"
},
{
"code": 2,
"label": "No"
},
{
"code": 3,
"label": "didn't know"
}
],
"type": 1,
"variable": "v13"
}
]
任何种类的帮助或教程都将不胜感激,而且,作为python开发人员,我将能够理解任何种类的python代码,因此代码答案也将受到赞赏。
请注意:第一行是每个问题所用变量的标题,如前所述,少数问题是多选问题,因此对于它们,响应以管道分隔代码('|')的形式捕获在行上方[变量v4,v9和v11是多项选择变量]
答案 0 :(得分:1)
读取每种数据类型。然后,您可以遍历每列,将其与json / dictionary中的相应索引值进行匹配,以创建映射字典。然后,使用该映射字典将值替换为关联的标签。
import pandas as pd
import json
def replace_all(text, dic):
for i, j in dic.items():
text = text.replace(i, j)
return text
data = pd.read_csv('C:/data.csv')
with open('C:/format.json') as json_file:
data_format = json.load(json_file)
cols = list(data.columns)
for col in cols:
data[col] = data[col].astype(str)
# Get index of the dictionary where the label value matches the column
idx = next((index for (index, d) in enumerate(data_format) if d["variable"] == col), None)
temp_dict = data_format[idx]
map_dict = {}
for each in temp_dict['options']:
map_dict[str(each['code'])] = each['label']
data[col]=data[col].apply(lambda x: replace_all(x, map_dict))
输出:
print (data.to_string())
v1 v2 v3 v4 v5 v6 v7 v8 v9 v10 v11 v12 v13
0 Female 127-35-18 Mumbai Trouser|Shorts|Jeans|Shirt|Coat Extremely Disagree Extremely Agree Govt. Service 120L and moreL-20L Poncho|Tie|Socks|Caps|Scarves 2 Star Cleanliness|Price|Location|Clothing Variety 1 No
1 Female 19-26 Jaipur Jeans|Shorts|Coat|Shirt|Trouser|Sweater Extremely Agree Extremely Disagree Student 120L and moreL-20L Poncho|Socks|Scarves|Caps|Tie 3 Star Cleanliness|Clothing Variety|Location|Price 2 didn't know
2 Male 19-26 Delhi Trouser|Shirt|Jeans|Coat Disagree Agree Student 10L-120L and moreL Tie|Caps|Poncho|Socks|Scarves 2 Star Location|Cleanliness 1 Yes
3 Female 27-35 Mumbai Shorts|Shirt|Jeans|Trouser|Coat|Sweater Extremely Disagree Agree Govt. Service 0-20L and moreL Socks|Poncho|Scarves 2 Star Location 4 Yes
答案 1 :(得分:0)
https://pandas.pydata.org/pandas-docs/stable/getting_started/10min.html有关熊猫的基础知识,这是一个功能强大的数据分析库,
https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html了解IO管理,
仅供参考,您可以使用方法pandas.DataFrame
和read_csv
(注意东方参数)将Everythig加载到read_json
,然后转换为所需的格式to_csv, to_json