In this lesson you will learn all the essentials of the VBA Web scraping. And at the end of the lesson, you will be able to write your first web scraping program to navigate to a webpage you like and collect the data. Also there are lots of other articles in this website which explain about various other techniques you can use in vba web scraping. So please check the link at the end of this lesson. You will find lots of helpful resources to develop efficient and robust VBA web scraping programs.
Today, websites play major role in our day today lives. Some websites are built to show particular information to the visitors. These websites usually contain only static web pages. But some websites can interact with their visitors. Amount of data included in the websites vary from one website to another. Some websites contain a little amount of data and some contain a large amount. Sometimes business organizations and individuals want to extract that information for their needs. This is where the technology called Web scraping comes into action. VBA Web scraping is a popular technique used to collect data automatically from websites to Excel.
Visual Basic for Applications (VBA) makes web scraping a powerful tool for extracting data from websites. VBA, integrated with Microsoft Excel, enables users to automate the retrieval of information, transforming tedious tasks into efficient processes. With a few lines of code, VBA can navigate web pages, interact with elements, and collect data systematically. This automation is particularly valuable for analysts, researchers, or businesses requiring up-to-date information for decision-making.
Here are some uses of web scraping
- To compare prices
- To monitor weather data
- To detect changes made to websites
- For research purposes
- To collect various other information
Complexity of a VBA web scraping program depends on the website's coding structure and the type of data we are going to collect.
So now let’s look at how we can create a simple VBA web scraping program which can open an internet explorer window and navigate to a particular webpage.
Before writing the VBA Web scraping program, we need to enable the developer tab and set up the Visual basic editor for coding.
Follow these simple steps to enable the developer tab if you don’t have it in your Excel ribbon.
- Launch Microsoft Excel on your computer.
- Click on the "File" tab in the ribbon and select "Options" at the bottom.
- In the Excel Options window, choose "Customize Ribbon" from the left sidebar.
- On the right side, you will see the main tabs. Check the box next to "Developer" to enable it, and then click "OK."
- Go to the Developer tab on the ribbon, and click on "Visual Basic" to open the Visual Basic for Applications (VBA) Editor.
You can find a more detailed explanation from this post.
How to show the Developer tab in Excel
Now you have enabled the developer tab in Excel. The Developer tab provides access to various tools, including the Visual Basic Editor, allowing you to create powerful automation scripts and macros. So now we need to insert a module to write our VBA web scraping program.
To insert a module to our project in Excel, follow these simple steps:
- Go to the Developer tab on the ribbon.
- Click on "Visual Basic" to open the VBA Editor.
- In the Project Explorer window on the left, click on the workbook(to select it if you have multiple workbooks open) for which you want to insert a module.
- From the menu, select "Insert" and then choose "Module."
- A new module will appear under the workbook in the Project Explorer.
- Now double-click on the module to open it in the central editing area.
VBA web scraping programs use internet explorers to collect the data. But you don’t need to configure anything manually for the Internet explorer to do this. Internet explorer will behave according to the code written in the VBA web scraping program. When developing VBA web scraping software, our first step is to create a subroutine to write the code. Let’s name the subroutine as MyFirstWebScrapingProgram.
Sub MyFirstWebScrapingProgram()
End Sub
Then, first of all, we need to define a variable of type object and then assign an internet explorer object to it. You can use the below line of code to create that object.
Dim objIE As Object
Set objIE = CreateObject("InternetExplorer.Application")
Then set the following properties to the created object. You can change them according to your requirements. First two properties(Top and Left) decide the position of the internet explorer window. Next two properties(Width and Height) will decide the size of the window.
objIE.Top = 0
objIE.Left = 0
objIE.Width = 800
objIE.Height = 600
Following property enables the visibility of the internet explorer window. You can set it to false if you want to run the program in the background. If you set it to false, the user won’t see how the data is collected using the internet explorer. He/she will only see how the data is filled to the Excel file.
objIE.Visible = True
So now we have created the internet explorer object and set the relevant properties. Next let's see how to navigate to a webpage. You can use the below VBA statement to navigate to the webpage you want.
objIE.Navigate ("http://www.example.com")
When creating a web scraping software, it's crucial to bear following in mind. Once you develop a complete VBA web scraping program, it will have to run through several VBA statements after navigating to a specific webpage. However, webpages can take some time to be fully loaded due to various reasons. So when you run the program there is a chance of executing subsequent VBA statements before the webpage is fully loaded. This can adversely affect the results you get from the software. But luckily we can take preventive action to overcome this. You can use below VBA statements to tell the program to wait until the webpage is fully loaded.
Do
DoEvents
Loop Until objIE.readystate = 4
Below is the full code to create an Internet Explorer object and navigate to a webpage.
Sub MyFirstWebScrapingProgram()
Dim objIE As Object
Set objIE = CreateObject("InternetExplorer.Application")
objIE.Top = 0
objIE.Left = 0
objIE.Width = 800
objIE.Height = 600
objIE.Visible = True
objIE.Navigate ("http://www.example.com")
Do
DoEvents
Loop Until objIE.readystate = 4
End Sub
Also if the website is temporarily unavailable, VBA web scraping programs can be suspended before collecting all the relevant data. So If our program tries to collect data from a website while it is down, it will halt our program. So we need to use error handling techniques to overcome temporary interruptions like this. Then our program will wait until the website is up again. If you don’t use this technique, then the user will have to check from time to time whether the program is working or halted.
Here is an example of using error handling in VBA web scraping.
Dim objIE As Object
the_start:
Set objIE = CreateObject("InternetExplorer.Application")
objIE.Top = 0
objIE.Left = 0
objIE.Width = 800
objIE.Height = 600
objIE.Visible = True
On Error Resume Next
objIE.navigate ("http://www.example.com/")
Do
DoEvents
If Err.Number <> 0 Then
objIE.Quit
Set objIE = Nothing
GoTo the_start:
End If
Loop Until objIE.readystate = 4
MsgBox "Web page has loaded"
In the above example we have used the error handling technique to check whether it has navigated to a webpage. But in practical situations a program will need to navigate through hundreds or thousands of pages. Then you will need to use a similar error handling technique inside loop iterations.
In the first steps of the web scraping programs, often we need to assign the HTML code to a variable of type string. So the below code shows how to assign HTML code of a webpage to a variable of type string. Here HTML_Code is the variable name.
Dim HTML_Code as String
HTML_Code = objIE.document.body.innerhtml
This will extract the inner HTML content of the <body> element of the web page loaded in the Internet Explorer (objIE) object and assign it to the variable called HTML_Code. You can even print the inner HTML content to the immediate window as follows.
Dim HTML_Code As String
HTML_Code = objIE.document.body.innerHTML
Debug.Print HTML_Code
Sometimes it is very easy to get required data from the inner-text without dealing with HTML codes. Below example shows how to assign inner text to a variable of type string and print it in the immediate window.
Dim InnerText as String
InnerText = objIE.document.body.innertext
Debug.Print InnerText
However if you do this, in lots of cases, you will realize that extracted content is not in a usable format. In lots of times you will get a stockpile of useless data. So in VBA web scraping we should always try to focus on getting only required data using id attributes and class attributes. Because these attributes help us to get only the targeted data. Read these posts to learn how to collect data using id and class attributes.
Web Scraping - Collecting Data From a Webpage
When to use the getElementsByClassName method
getElementsByClassName Vs getElementById
But sometimes required data might not be associated with any id or class name. In such cases we can use the javascript method “getElementsByTagName'' to collect the required data. Following example shows how to get elements by < a > tags. (anchor tags)
Set Alllinks = objIE.document.getelementsbytagname("A")
For Each Hyperlink In Alllinks
MsgBox Hyperlink.innertext & "-" & Hyperlink.href
Next
Sometimes we need our VBA web scraping programs to follow the links and collect the data. To do that the program needs to click on < a > tags. Now let’s see how we can accomplish this. To achieve this the program needs to loop through all the < a > tags of the webpage and find the relevant link. Then the VBA program can click the link to navigate to that page. Let’s try to understand this using an example. In this example, the VBA web scraping program will click the link if it finds a link with the text "New Arrivals".
Set Alllinks = objIE.document.getelementsbytagname("A")
For Each Hyperlink In Alllinks
If InStr(Hyperlink.innertext, "New Arrivals") > 0 Then
Hyperlink.Click
Exit For
End If
Next
When creating VBA web scraping programs to collect data from websites, several built-in functions become imperative. Here are some key VBA functions essential for web scraping programs.
Commonly used VBA functions in web scraping
Mid - This VBA function can be used to extract a string segment from it's parent string.
InStr - This function can be used to check whether certain string is included in another string and get the position of that sub string if it exists.
Replace - This VBA function replaces a character or string segments with another character or string.
Above are the basics you need to know when developing VBA web scraping programs. Follow this complete guide if you want to become an expert in VBA web scraping.