I’m working on an update to the Footbet Soccer Prediction WebApp to run the prediction from an API with Node Js which has proved faster (and fancier). An endeavor which has tried my patience to the very core while using exceljs.
I am moving away from running all prediction functions on client side to node js server. I had to also walk away from downloading multiple csv files from football-data.co.uk to just a few excel files with all data from different leagues squished into one file.
That’s where I ran into problems with pulling data from the files without bloating my final results with unnecessary fields that will never be used in the prediction.
So, what’s the problem?
The fields I was interested in from the excel files were Date, League(aka Div), HomeTeam, AwayTeam… blah blah blah you the gist. Thing is, the Excel files don’t have the same headers and in some files there were headers with different name but holding same data as other files. It’s a bit of a mess.
To understand the problem at hand, look at the screenshots below of the Excel documents I had to work with. The data is similar since I am mostly interested in goals scored, match outcome and odds at the beginning of the match.
The two files above present a unique opportunity to utilise the power of Exceljs – the Node Js package designed for such a situation. Problem is, going through the documentation for the herebefore mentioned package, there is no explicit explanation on how to fetch from a spreadsheet by headers.
Read: Footbet and how it works
It however does mention that you can set your own headers and then use the newly defined headers to manipulate your data, whether it is to write or read data.
The Solution with ExcelJs
Without further ado, let’s see how we can pull data from a data sheet using the sheet headers.
This tutorial utilizes the premise that your excel file’s first line is the header for your data.
First things first, if you have not installed exceljs go ahead and do that.
npm install --save exceljs
Initialize the package:
const excel = require('exceljs')
For my particular predicament, I wanted to merge all data from the different leagues in different files into one csv file. Working with csv is a lot more convenient for my application.
To get all our data, we create a csv file our preferred headers and then gradually append to the file. Here is the code for that:
var csvHeaders = 'Div,Date,HomeTeam,AwayTeam,FTHG,FTAG,FTR,ODDH,ODDD,ODDA' + "\n";
fs.writeFileSync('./data/ex.csv', csvHeaders, {encoding: 'utf8'});
The code above creates a csv file with our preferred headers and from here on, we will append all data to this file.
Assuming that you have already downloaded your files or typed them out in excel, here is the code to get your data. My excel documents contain multiple sheets and I will be iterating through each sheet, defining my headers and fetching my data.
var workbook = new excel.Workbook();
// your xlsx location relative to your app file
var filePath = './path-to-file/file.xlsx';
// read the file with exceljs
workbook.readFile(filePath).then(()=>{
// loop through sheets
workbook.eachSheet((sheet,sheetId)=>{
var columns = [];
// get headers from the first row of your data
var headerRow = sheet.getRow(1);
// loop through header keys
headerRow.eachCell((cell,rowNum)=>{
// define your columns here
columns.push({header: cell.value, key: cell.value});
})
// set the columns
sheet.columns = columns;
// remove the first row since it's no longer needed
sheet.spliceRows(0,1);
// loop through the rest of the rows
sheet.eachRow({includeEmpty: false}, (row, rowNum)=>{
console.log(row.getCell('Date').value);
//build your csv row here
// var csvRow = row.getCell(key).value .... + "\n";
// append the comma separated row to csv file
// fs.appendFileSync('./data/ex.csv', csvRow, 'utf8');
})
})
});
When I console.log(columns), here is what I get:
[
{ header: 'Div', key: 'Div' },
{ header: 'Date', key: 'Date' },
{ header: 'Time', key: 'Time' },
{ header: 'HomeTeam', key: 'HomeTeam' },
{ header: 'AwayTeam', key: 'AwayTeam' },
{ header: 'FTHG', key: 'FTHG' },
{ header: 'FTAG', key: 'FTAG' },
{ header: 'FTR', key: 'FTR' },
{ header: 'HTHG', key: 'HTHG' },
{ header: 'HTAG', key: 'HTAG' },
{ header: 'HTR', key: 'HTR' }
...
]
And when I run console.log(row.getCell('Date').value)
, here is what I get:
...
2016-09-17T00:00:00.000Z
2016-09-18T00:00:00.000Z
...
Finally, we can now get our data by the corresponding header leaving out what we do not need. Stress levels are back to normal 2020 levels.
That’s it, happy coding.