Excel’s VSTACK – The One Function to Append Them All
Imagine a workbook with 12 sheets containing sales tables: January through December.
Now imagine a workbook with 50 sheets containing product distribution tables: Alabama through Wyoming.
Brace yourself. Now imagine a workbook with 195 sheets containing gross domestic product tables: Afghanistan through Zimbabwe.
Which do you think would be easier to combine into a single table, the file with 12, 50, or 195 sheets?
It’s a trick question. They all take the same amount of effort provided you use the correct Excel feature.
Let’s examine how we can accomplish this using a single formula. Yes, you read that correctly, a single formula.
Combining multiple tables into a single table is a common task. Users commonly perform this act to facilitate another Excel feature, like creating a Pivot Table or a chart.
There are many ways to combine multiple tables into a single table. Three of the most common ways to do this are:
- Manually copy and paste the tables into a new combined table.
- Create a VBA macro to automate the copy/paste process.
- Use Power Query to combine all sheets in a file.
Many users are not comfortable enough to write VBA code to create a macro or are familiar enough with Power Query to produce such an action.
If you are running Office 365, you can now perform this activity using a single function: VSTACK (or HSTACK).
The VSTACK function combines several cells or ranges of cells into a single vertical stack of data. Its companion function HSTACK does the same thing but outputs to a horizontal stack.
Let’s look at an example.
NOTE: As of the writing of this blog (May 2022), the VSTACK function is in Beta format and only available to Office Insider users. The public release is due in the summer/fall timeframe of 2022.
Combining Tables from Multiple Sheets Using the VSTACK Function
We have a file with multiple sheets where each sheet contains a similarly structured table.
Our objective is to combine the tables from each sheet into a single table on a new sheet.
The VSTACK function will work nicely in this situation.
VSTACK has the following function syntax:
=VSTACK( array1, [array2], … )
- array – is the cell or range of cells to be stacked (i.e., combined).
Using our extreme example from earlier of 195 country tables, we don’t want to manually select the exact range of each country’s table, as each table will vary in the number of rows.
We will take the easy way out and select a range that would include the number of rows for our country with the most rows, plus 20% for future expansion.
In our current example, our largest table has just under 40 rows, so we will select 50 rows just to be safe.
The steps would proceed as follows:
- Start a new sheet and name it “Index” (or whatever you wish to name it.)
- Copy the header row from one of the sheets and paste it to the first row of the newly added “Index” sheet.
- In what will be the upper-left corner of the output table (in this case, cell A2), write the following formula…
=VSTACK('R10-1:R40-3'!A2:G50)
NOTE: The sheets are named “R10-1” for the first sheet and “R40-3” for the last sheet.
The reference in VSTACK’s array argument of ‘R10-1:R40-3’ selects all sheets between “R10-1” and “R40-3”. This is known as a 3D Range Reference.
Imagine “R10-1” and “R40-3” as bookend sheets. Any sheets placed between these two sheets will be included in the reference.
If we had a workbook with 12 sheets, each sheet representing a single month of the year, we could have written the below formula that uses “January” and “December” as range reference bookends.
=VSTACK(January:December!A2:G50)
Returning to our example; we get a single stack of data that consists of all tables in the A2:G50 range of every sheet between “R10-1” and “R40-3”.
There is one slight issue with the output. We scroll down to what would be the last of the first sheet’s records and we see zero-filled rows for any row that is empty up to the 50th row in the table.
The VSTACK function doesn’t differentiate between populated and unpopulated cells in the range(s).
Filtering Out the Blank Rows
To combat the above problem of zero-filled rows, we can use the FILTER function to eliminate non-empty rows provided by the VSTACK function. In other words, filter out the blank rows.
=FILTER(VSTACK('R10-1:R40-3'!A2:G50), VSTACK('R10-1:R40-3'!A2:A50) <> "")
Notice that the FILTER function’s include argument is examining all cells in column “A” to detect the presence of any data (not equal to empty text).
Even though the VSTACK returned zeroes in cells with no data, this is purely for presentation purposes. There is no data in those cells.
The FILTER/VSTACK function combination has eliminated the zero-based (i.e., empty) rows.
Sorting the Filtered, Stacked Results
Our boss was so impressed with the results that they requested a final tweak to the output: sorting.
No need to burn down the building. We’ll just wrap the FILTER/VSTACK formula within a SORT function.
=SORT(FILTER(VSTACK('R10-1:R40-3'!A2:G50), VSTACK('R10-1:R40-3'!A2:A50) <> ""), 4)
The “4” in the SORT function’s [sort_index] argument tells SORT to sort by the 4th column in the table. In this case, the “Product Name” column.
We don’t have to tell it to sort in ascending order as this is the SORT function’s default behavior.
Performing a Multi-Level Sort
We can take this to a whole other level of sorting by “Total Sales” within each “Product Name”.
If we wish to perform a secondary sort in descending order, we need to modify the formula to list 2 columns to sort and by which direction for each column.
The formula would be modified as follows:
=SORT(FILTER(VSTACK('R10-1:R40-3'!A2:G50), VSTACK('R10-1:R40-3'!A2:A50) <> ""), {4,7}, {1,-1} )
Understanding the Curly-Braces
If you are unfamiliar with the use of curly braces in formulas, this just provides the formula with a list of arguments.
As most arguments expect a single response, the curly braces allow you to provide multiple responses.
In our case, we want to perform a primary sort by the 4th column, then a secondary sort by the 7th column. Thus the {4,7} response to the [sort_index] argument.
As far as the sort directions for these two columns, we want to sort the 4th column in ascending order (1) and the 7th column in descending order (-1). Thus the {1,-1} response to the [sort_order] argument.
Number Formatting “Issue”
Be mindful that cell formatting is not carried over to the output. If you have date, number, or any other styles (built-in or custom), these will not be applied to the matching output columns.
These styles will need to be applied to the output just as they were with the source data if you wish to retain the data’s original appearance.
Dealing With Sheets That Fall Outside the Defined Range
Suppose you have 3 sheets (“Jan”, “Feb”, and “Mar”) with a 4th sheet that uses VSTACK to combine the other sheets into a single table.
If we add a sheet for the following month named “Apr”, we need to ensure that the sheet falls within the defined range of VSTACK‘s array argument (Jan:Mar).
=VSTACK(Jan:Mar!A2:B50)
Adding a new sheet like below will fail because the “Apr” sheet falls outside the “Jan:Mar” range.
We would need to constantly update the formula to redefine the range extents.
Here’s a thought: suppose we create permanent bookend sheets called “Start” and “End” that have no data. If the sheets exist between these bookend sheets, we’ll never need to update the formula’s range reference.
The formula can be rewritten as follows.
=VSTACK(Start:End!A2:B50)
Of course, we’ll need to use the FILTER function to remove the empty rows brought in from these bookend sheets, but we were doing that anyway for the extra rows on the real data sheets.
Sorting Sheets to Ensure Data Inclusion
This next trick is for those of you who like a little challenge in your life.
Because we want to ensure that the added sheets always fall between the bookend sheets named “Start” and “End”, we can use some VBA code to sort the sheets whenever new sheets are added.
NOTE: Adding the code below to your file will require a resaving of the file with a .XLSM file extension.
Create a new module sheet in the Visual Basic Editor and copy/paste the below code.
Sub Sort_Sheets() Dim CurrentSheetIndex As Integer Dim PrevSheetIndex As Integer For CurrentSheetIndex = 1 To Sheets.Count For PrevSheetIndex = 1 To CurrentSheetIndex - 1 If UCase(Sheets(PrevSheetIndex).Name) > _ UCase(Sheets(CurrentSheetIndex).Name) Then Sheets(CurrentSheetIndex).Move _ Before:=Sheets(PrevSheetIndex) End If Next PrevSheetIndex Next CurrentSheetIndex End Sub
NOTE: This code can also be used independently of any VSTACK scenario when you just want to sort your sheets.
The problem with this code is that it sorts alphabetically in ascending order. Since our sheet tabs have month name abbreviations as well as the bookend names, we will get an undesirable sort result.
We will rename the sheets using the following conventions:
- Rename the month name abbreviations to month number and year (ex: “Mar” would be “3-22”).
- Add two periods to the beginning of “2022 Sales”.
- Add one period to the beginning of “Start”.
- Add a tilde to the beginning of “End”.
“How do these new names help?”
- Any sheet where the name starts with a period will be placed before any sheet where the name begins with letters or numbers. A sheet that begins with 2 periods is placed before a sheet that begins with 1 period.
- Any sheet where the name starts with a tilde (~) will be placed after any sheet where the name begins with letters or numbers.
- The M-YY names for the months force the sheets into chronological order.
When you add a sheet to the workbook, it won’t matter where the sheet is created. Running the macro will place the sheets in the desired order.
The VSTACK function will need to be updated to reflect these newly added characters. Note the single quotes that are surrounding the sheet names.
=VSTACK('.Start:~End'!A2:G50)
NOTE: As Excel VBA does not possess a “worksheet rename” event trigger to force the macro to run automatically, the user will need to be mindful to run this macro after adding and renaming new sheets.
Bonus Step
Since the “Start” and “End” sheets are there to act as bookends and contain no real data, we can hide these sheets, so they are not visible to the user but still serve the purpose of bounding the range selection.
Practice Workbook
Feel free to Download the Workbook HERE.
ليست هناك تعليقات:
إرسال تعليق