我有以下格式的数据集。
row_num; locale; day_of_week; hour_of_day; agent_id; entry_page; path_id_set; traffic_type; session_durantion; hits
“ 988681; L6; Monday; 17; 1; 2111;”“ 31672; 0”“; 6; 7037; \ N” “ 988680; L2; Thursday; 22; 10; 2113;”“” 31965; 0“”; 2; 49; 14“ “ 988679; L4;星期六; 21; 2; 2100;”“ 0; 78464”“; 1; 1892; 14” “ 988678; L3; Saturday; 19; 8; 2113; 51462; 6; 0; 1; \ N”
我希望它采用以下格式:
行数区域设置day_of_week hour_of_day agent_id entry_page path_id_set traffic_type session_durantion hits
988681 L6星期一17 1 2111 31672 0 6 7037 N
988680 L2星期四22 10 2113 31965 0 2 49 14
988679 L4星期六21 2 2100 0 78464 1 1892 14
988678 L3星期六19 8 2113 51462 6 0 1 N
我尝试使用以下代码:
import pandas as pd
df = pd.read_csv("C:\Users\Rahhy\Desktop\trivago.csv", delimiter = ";")
但是我得到一个错误:
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \UXXXXXXXX escape
答案 0 :(得分:0)
使用replace()
:
with open("data_test.csv", "r") as fileObj:
contents = fileObj.read().replace(';',' ').replace('\\', '').replace('"', '')
print(contents)
输出:
row_num locale day_of_week hour_of_day agent_id entry_page path_id_set traffic_type session_durantion hits
988681 L6 Monday 17 1 2111 31672 0 6 7037 N 988680 L2 Thursday 22 10 2113 31965 0 2 49 14 988679 L4 Saturday 21 2 2100 0 78464 1 1892 14 988678 L3 Saturday 19 8 2113 51462 6 0 1 N
编辑:
您可以打开文件,读取文件内容,替换不需要的字符。将新内容写入文件,然后通过pd.read_csv
进行读取:
with open("data_test.csv", "r") as fileObj:
contents = fileObj.read().replace(';',' ').replace('\\', '').replace('"', '')
# print(contents)
with open("data_test.csv", "w+") as fileObj2:
fileObj2.write(contents)
import pandas as pd
df = pd.read_csv(r"data_test.csv", index_col=False)
print(df)
输出:
row_num locale day_of_week hour_of_day agent_id entry_page path_id_set traffic_type session_durantion hits
988681 L6 Monday 17 1 2111 31672 0 6 7037 N 988680 L2 Thursday 22 10 2113 31965 0 2 49 14 988679 L4 Saturday 21 2 2100 0 78464 1 1892 14 988678 L3 Saturday 19 8 2113 51462 6 0 1 N
答案 1 :(得分:0)
@change="clickCheckbox"
由于有11个数据字段和10个标头,因此仅使用前10个字段。您必须弄清楚您要如何处理最后一个(值:\ N,14)
输出:
import pandas as pd
from io import StringIO
# Load the file to a string (prefix r (raw) to not use \ for escaping)
filename = r'c:\temp\x.csv'
with open(filename, 'r') as file:
raw_file_content = file.read()
# Remove the quotes which break the CSV file
file_content_without_quotes = raw_file_content.replace('"','')
# Simulate a file with the corrected CSV content
simulated_file = StringIO(file_content_without_quotes)
# Get the CSV as a table with pandas
# Since the first field in each data row shall not be used for indexing we need to set index_col=False
csv_data = pd.read_csv(simulated_file, delimiter = ';', index_col=False)
print(csv_data['hits']) # print some column
csv_data
请参见https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html