Question

我有一张非常宽的Excel表格，来自A - DIE列（约2500列宽）的调查数据。每列都是一个问题，每一行都是一个响应。我尝试将数据上传到SQL并使用UNPIVOT函数将其转换为更友好的SQL格式，但我甚至无法将其加载到SQL中，因为它超过了1024列限制。

基本上，我有一张Excel表格，如下所示：

但我想把它转换成这样：

在Excel（上传之前）或SQL（绕过1024列限制）中，我有哪些选项可以进行此更改？

Answer 1

我必须这么做。我的解决方案是编写一个Python脚本，该脚本将取消交叉显示CSV文件（通常从Excel导出），从而创建另一个CSV文件。 Python代码在这里：https://pypi.python.org/pypi/un-xtab/，文档在这里：http://pythonhosted.org/un-xtab/。我从来没有在2500列的文件上运行它，但不知道为什么它不起作用。

Answer 2

R在其中一个库中有一个非常特殊的函数调用。您还可以使用R将数据连接，读取和写入数据库。建议下载R和Rstudio。

这是一个可以帮助您入门的工作脚本：

示例数据：

df <- data.frame(id = c(1,2,3), question_1 = c(1,0,1), question_2 = c(2,0,2))
df

输入表：

  id question_1 question_2
1  1          1          2
2  2          0          0
3  3          1          2

转置数据的代码：

df2 <- gather(df, key = id, value = values)
df2

输出：

   id        id values
1  1 question_1      1
2  2 question_1      0
3  3 question_1      1
4  1 question_2      2
5  2 question_2      0
6  3 question_2      2

您可以导入和导出csv数据的一些辅助函数：

# Install and load the necessary libraries
install.packages(c('tidyr','readr'))
library(tidyr)
library(readr)

# to read a csv file
df <- read_csv('[some directory][some filename].csv')

# To output the csv file
write.csv(df2, '[some directory]data.csv', row.names = FALSE)

Answer 3

感谢所有帮助。由于SQL（超过1024列宽）和Excel（输出中超过100万行）的限制，我最终使用了Python。我从rd_nielson的代码中借用了这些概念，但这比我需要的要复杂一些。如果它对其他人有帮助，这就是我使用的代码。它输出一个包含3列和1400万行的csv文件，我可以上传到SQL。

import csv

with open('Responses.csv') as f:
    reader = csv.reader(f)
    headers = next(reader)  # capture current field headers
    newHeaders = ['ResponseID','Question','Response']  # establish new header names
    with open('PythonOut.csv','w') as outputfile:
        writer=csv.writer(outputfile, dialect='excel', lineterminator='\n')  
        writer.writerow(newHeaders)  # write new headers to output

        QuestionHeaders = headers[1:len(headers)]  # Slice the question headers from original header list
        for row in reader:
            questionCount = 0  # start counter to loop through each question (column) for every response (row)
            while questionCount <= len(QuestionHeaders) - 1:  
                newRow = [row[0], QuestionHeaders[questionCount], row[questionCount + 1]]  
                writer.writerow(newRow)
                questionCount += 1

将宽Excel表重新格式化为更友好的SQL结构

3 个答案: