我有一个很大的txt文件,该文件具有用空格分隔的“列”,我希望能够将其转换为JSON,xlsx,csv等格式,以便可以对数据执行编程操作。
文件很大,所以我不会发布整个内容-这是一个示例片段:
ID number Name TitlFed Grade GamesBorn Flag
10207538 A E M, Doshtagir BAN 1864 0 i
10206612 A K M, Sourab BAN 1714 0 i
5045886 A K, Kalshyan IND 1958 0 1964
8605360 A La, Teng Hua CHN 1915 0 1993 wi
5031605 A, Akshaya IND 2016 29 1994 w
5080444 A, Sohita IND 1447 0 1995 wi
5706068 A. Nashir, Mohd Khairul Nazrin MAS 1878 0 i
10201971 A.f.m., Mahfuzul Haque BAN 1690 0
10202650 A.k. Azad, Akand BAN 1692 0 i
10210997 A.K.M. Mehfuz BAN 2015 0
24663832 Aab, Manfred GER 1808 0 1963
1701991 Aaberg, Anton SWE 2374 4 1972
1513966 Aabid, Ryaad NOR 1642 0 1958
1407589 Aabling-Thomsen, Jakob f DEN 2331 18 1985
12524670 Aadeli, Arvin IRI 2015 0
5072662 Aadhityaa, M IND 1898 10 1999
25034677 Aadish S IND 1528 5 1999
5086183 Aaditt, M K IND 1610 0 1996 i
5027942 Aaditya, Jagadeesh IND 1814 16 1998
25011952 Aadityan G IND 1621 7 2001
5063485 Aadityan, N. IND 1758 8 1996
1427024 Aagaard, Gert DEN 2030 7 1966
1401815 Aagaard, Jacob g DEN 2506 9 1973
1411802 Aagaard, Kasper DEN 1913 0 1992 i
1017942 Aagaard, Michael NED 2075 0 1960
1406248 Aage, Bjarke DEN 2068 0 1978 i
1506064 Aagedal, Geir Ole NOR 1833 7 1957
25021044 Aagney L., Narasimhan IND 1285 6 2000
10205640 Aahelee, Sarker BAN 1577 0 w
25014510 Aakanksha Hagawane IND 1622 0 2000 w
25030388 Aakash Jain IND 1577 7 1998
35004336 Aakash S B IND 1235 10 1998
5093295 Aakasha IND 1620 3 2000 w
504599 Aakio, Seppo FIN 2078 0 1954
1402315 Aalbaek, Kurt Frede Nissen DEN 1440 0 1944
1024388 Aalbers, Klaas NED 1891 0 1955 i
2252465 Aalbersberg Kroon, Pedro ESP 1878 0 1933
2218682 Aalders, Hendricus ESP 2021 0 1930 i
1033948 Aalders, Peter NED 1903 0 1964
501956 Aaltio, Erkki FIN 2118 0 1935
1504452 Aandal, Kristian NOR 2012 0 1985 i
我使用javascript编程,因此理想情况下,我希望将其转换为JSON,理想情况下,每个播放器/ id都应位于自己的对象中,如下所示:
var AllPlayers =
[{
"2434324243":
{
"name":"some guy",
"title":"f",
"fed":"USA",
"grade":"1999",
"games":"3",
"born":"1990"
},
"8787878887":
{
"name":"anyone",
"title":"",
"fed":"BER",
"grade":"2222",
"games":"6",
"born":"1970"
}
}
]
我尝试在节点中使用fs模块读取txt文件,然后计算了每行的长度(71个字符),并尝试将其推入数组-但是似乎在读取时会消除空白该文件使之成为一种不可行的方法,因为每个人的信息的长度都是可变的。
var fs = require('fs');
var allPlayers=[];
thisPlayer='';
//1st row length =74
//other rows 71
//14895 rows
fs.readFile('jul12frl.txt', 'utf8', function(err, contents) {
for(let x=74;x<14895;x++){
thisPlayer+=contents[x];
if(thisPlayer.length==71){
allPlayers.push(thisPlayer);
thisPlayer='';
}
}
});
我还尝试使用Excels内置向导将txt转换为excel格式-但它没有选择所有所需的列-它将名称/标题/进纸/等级列合并为一个大列。
答案 0 :(得分:1)
const data = `10207538 A E M, Doshtagir BAN 1864 0 i
10206612 A K M, Sourab BAN 1714 0 i
5045886 A K, Kalshyan IND 1958 0 1964
8605360 A La, Teng Hua CHN 1915 0 1993 wi
5031605 A, Akshaya IND 2016 29 1994 w
5080444 A, Sohita IND 1447 0 1995 wi
5706068 A. Nashir, Mohd Khairul Nazrin MAS 1878 0 i
10201971 A.f.m., Mahfuzul Haque BAN 1690 0
10202650 A.k. Azad, Akand BAN 1692 0 i
10210997 A.K.M. Mehfuz BAN 2015 0
24663832 Aab, Manfred GER 1808 0 1963
1701991 Aaberg, Anton SWE 2374 4 1972
1513966 Aabid, Ryaad NOR 1642 0 1958
1407589 Aabling-Thomsen, Jakob f DEN 2331 18 1985
12524670 Aadeli, Arvin IRI 2015 0
5072662 Aadhityaa, M IND 1898 10 1999
25034677 Aadish S IND 1528 5 1999
5086183 Aaditt, M K IND 1610 0 1996 i
5027942 Aaditya, Jagadeesh IND 1814 16 1998
25011952 Aadityan G IND 1621 7 2001
5063485 Aadityan, N. IND 1758 8 1996
1427024 Aagaard, Gert DEN 2030 7 1966
1401815 Aagaard, Jacob g DEN 2506 9 1973
1411802 Aagaard, Kasper DEN 1913 0 1992 i
1017942 Aagaard, Michael NED 2075 0 1960
1406248 Aage, Bjarke DEN 2068 0 1978 i
1506064 Aagedal, Geir Ole NOR 1833 7 1957
25021044 Aagney L., Narasimhan IND 1285 6 2000
10205640 Aahelee, Sarker BAN 1577 0 w
25014510 Aakanksha Hagawane IND 1622 0 2000 w
25030388 Aakash Jain IND 1577 7 1998
35004336 Aakash S B IND 1235 10 1998
5093295 Aakasha IND 1620 3 2000 w
504599 Aakio, Seppo FIN 2078 0 1954
1402315 Aalbaek, Kurt Frede Nissen DEN 1440 0 1944
1024388 Aalbers, Klaas NED 1891 0 1955 i
2252465 Aalbersberg Kroon, Pedro ESP 1878 0 1933
2218682 Aalders, Hendricus ESP 2021 0 1930 i
1033948 Aalders, Peter NED 1903 0 1964
501956 Aaltio, Erkki FIN 2118 0 1935
1504452 Aandal, Kristian NOR 2012 0 1985 i`;
const rows = data.split("\n");
function parseRow(row) {
const id = row.slice(0, 10).trim();
const name = row.slice(10, 44).trim();
const title = row.slice(44, 48).trim();
const country = row.slice(48, 53).trim();
const grade = row.slice(53, 60).trim();
const games = row.slice(60, 64).trim();
const born = row.slice(64, 70).trim();
const flag = row.slice(70, 72).trim();
return {
id,
name,
title,
country,
grade: grade && parseInt(grade),
games: games && parseInt(games, 10),
born : born && parseInt(born, 10),
flag
}
}
const parsedRows = rows.reduce((acc, row) => {
const parsed = parseRow(row);
acc[parsed.id] = parsed;
return acc;
}, {});
console.log(parsedRows);
鉴于行和列的长度与示例中提供的相同,您可以像这样解析它:
// Split original string into rows as an array of strings
const rows = data.split("\n"); // could be replaced with contents read from file
function parseRow(row) {
// Parse the values by extracting it from the row by start and end index of the column
const id = row.slice(0, 10).trim();
const name = row.slice(10, 44).trim();
const title = row.slice(44, 48).trim();
const country = row.slice(48, 53).trim();
const grade = row.slice(53, 60).trim();
const games = row.slice(60, 64).trim();
const born = row.slice(64, 70).trim();
const flag = row.slice(70, 72).trim();
return {
id,
name,
title,
country,
// Parse numbers
grade: grade && parseInt(grade, 10),
games: games && parseInt(games, 10),
born : born && parseInt(born, 10),
flag
}
}
const parsed = rows.reduce((acc, row) => {
const parsed = parseRow(row);
acc[parsed.id] = parsed;
return acc;
}, {});
这是一个粗略的解决方案,但它似乎可以解决您的问题。运行您提供的示例数据确实可行。如果完整数据集与示例数据不同,则可能需要更新各个列的开始索引和结束索引。
但是,在您提供的示例数据中,这些列只是空格分隔。如果实际数据集是制表符分隔的,那么解决方案将更易于使用。 [id, name, title, country, grade, games, born, flag] = row.split('\t')