Pages

Web Scraping - Complete Guide

In this lesson you will learn all the essentials of the VBA Web scraping. And at the end of the lesson, you will be able to write your first web scraping program to navigate to a webpage you like and collect the data. Also there are lots of other articles in this website which explain about various other techniques you can use in vba web scraping. So please check the link at the end of this lesson. You will find lots of helpful resources to develop efficient and robust VBA web scraping programs.

Today, websites play major role in our day today lives. Some websites are built to show particular information to the visitors. These websites usually contain only static web pages. But some websites can interact with their visitors. Amount of data included in the websites vary from one website to another. Some websites contain a little amount of data and some contain a large amount. Sometimes business organizations and individuals want to extract that information for their needs. This is where the technology called Web scraping comes into action. VBA Web scraping is a popular technique used to collect data automatically from websites to Excel.

Visual Basic for Applications (VBA) makes web scraping a powerful tool for extracting data from websites. VBA, integrated with Microsoft Excel, enables users to automate the retrieval of information, transforming tedious tasks into efficient processes. With a few lines of code, VBA can navigate web pages, interact with elements, and collect data systematically. This automation is particularly valuable for analysts, researchers, or businesses requiring up-to-date information for decision-making.

Here are some uses of web scraping

  • To compare prices
  • To monitor weather data
  • To detect changes made to websites
  • For research purposes
  • To collect various other information

Complexity of a VBA web scraping program depends on the website's coding structure and the type of data we are going to collect.

So now let’s look at how we can create a simple VBA web scraping program which can open an internet explorer window and navigate to a particular webpage.

Before writing the VBA Web scraping program, we need to enable the developer tab and set up the Visual basic editor for coding.

Follow these simple steps to enable the developer tab if you don’t have it in your Excel ribbon.

  1. Launch Microsoft Excel on your computer.
  2. Click on the "File" tab in the ribbon and select "Options" at the bottom.
  3. In the Excel Options window, choose "Customize Ribbon" from the left sidebar.
  4. On the right side, you will see the main tabs. Check the box next to "Developer" to enable it, and then click "OK."
  5. Go to the Developer tab on the ribbon, and click on "Visual Basic" to open the Visual Basic for Applications (VBA) Editor.

You can find a more detailed explanation from this post.
How to show the Developer tab in Excel

Now you have enabled the developer tab in Excel. The Developer tab provides access to various tools, including the Visual Basic Editor, allowing you to create powerful automation scripts and macros. So now we need to insert a module to write our VBA web scraping program.

To insert a module to our project in Excel, follow these simple steps:

  1. Go to the Developer tab on the ribbon.
  2. Click on "Visual Basic" to open the VBA Editor.
  3. In the Project Explorer window on the left, click on the workbook(to select it if you have multiple workbooks open) for which you want to insert a module.
  4. From the menu, select "Insert" and then choose "Module."
  5. A new module will appear under the workbook in the Project Explorer.
  6. Now double-click on the module to open it in the central editing area.

VBA web scraping programs use internet explorers to collect the data. But you don’t need to configure anything manually for the Internet explorer to do this. Internet explorer will behave according to the code written in the VBA web scraping program. When developing VBA web scraping software, our first step is to create a subroutine to write the code. Let’s name the subroutine as MyFirstWebScrapingProgram.

Sub MyFirstWebScrapingProgram()


End Sub

Then, first of all, we need to define a variable of type object and then assign an internet explorer object to it. You can use the below line of code to create that object.

Dim objIE As Object

Set objIE = CreateObject("InternetExplorer.Application")

Then set the following properties to the created object. You can change them according to your requirements. First two properties(Top and Left) decide the position of the internet explorer window. Next two properties(Width and Height) will decide the size of the window.

objIE.Top = 0
objIE.Left = 0
objIE.Width = 800
objIE.Height = 600

Following property enables the visibility of the internet explorer window. You can set it to false if you want to run the program in the background. If you set it to false, the user won’t see how the data is collected using the internet explorer. He/she will only see how the data is filled to the Excel file.

objIE.Visible = True

So now we have created the internet explorer object and set the relevant properties. Next let's see how to navigate to a webpage. You can use the below VBA statement to navigate to the webpage you want.

objIE.Navigate ("http://www.example.com")

When creating a web scraping software, it's crucial to bear following in mind. Once you develop a complete VBA web scraping program, it will have to run through several VBA statements after navigating to a specific webpage. However, webpages can take some time to be fully loaded due to various reasons. So when you run the program there is a chance of executing subsequent VBA statements before the webpage is fully loaded. This can adversely affect the results you get from the software. But luckily we can take preventive action to overcome this. You can use below VBA statements to tell the program to wait until the webpage is fully loaded.

Do
DoEvents
Loop Until objIE.readystate = 4

Below is the full code to create an Internet Explorer object and navigate to a webpage.

Sub MyFirstWebScrapingProgram()

     Dim objIE As Object

     Set objIE = CreateObject("InternetExplorer.Application")

     objIE.Top = 0
     objIE.Left = 0
     objIE.Width = 800
     objIE.Height = 600

     objIE.Visible = True

     objIE.Navigate ("http://www.example.com")

     Do
     DoEvents
     Loop Until objIE.readystate = 4

End Sub

Also if the website is temporarily unavailable, VBA web scraping programs can be suspended before collecting all the relevant data. So If our program tries to collect data from a website while it is down, it will halt our program. So we need to use error handling techniques to overcome temporary interruptions like this. Then our program will wait until the website is up again. If you don’t use this technique, then the user will have to check from time to time whether the program is working or halted.

Here is an example of using error handling in VBA web scraping.

Dim objIE As Object

the_start:

Set objIE = CreateObject("InternetExplorer.Application")
objIE.Top = 0
objIE.Left = 0
objIE.Width = 800
objIE.Height = 600
objIE.Visible = True

On Error Resume Next
objIE.navigate ("http://www.example.com/")

Do
   DoEvents
   If Err.Number <> 0 Then
        objIE.Quit
        Set objIE = Nothing
        GoTo the_start:
   End If
Loop Until objIE.readystate = 4

MsgBox "Web page has loaded"

In the above example we have used the error handling technique to check whether it has navigated to a webpage. But in practical situations a program will need to navigate through hundreds or thousands of pages. Then you will need to use a similar error handling technique inside loop iterations.

In the first steps of the web scraping programs, often we need to assign the HTML code to a variable of type string. So the below code shows how to assign HTML code of a webpage to a variable of type string. Here HTML_Code is the variable name.

Dim HTML_Code as String

HTML_Code = objIE.document.body.innerhtml

This will extract the inner HTML content of the <body> element of the web page loaded in the Internet Explorer (objIE) object and assign it to the variable called HTML_Code. You can even print the inner HTML content to the immediate window as follows.

Dim HTML_Code As String

HTML_Code = objIE.document.body.innerHTML

Debug.Print HTML_Code

Sometimes it is very easy to get required data from the inner-text without dealing with HTML codes. Below example shows how to assign inner text to a variable of type string and print it in the immediate window.

Dim InnerText as String

InnerText = objIE.document.body.innertext

Debug.Print InnerText

However if you do this, in lots of cases, you will realize that extracted content is not in a usable format. In lots of times you will get a stockpile of useless data. So in VBA web scraping we should always try to focus on getting only required data using id attributes and class attributes. Because these attributes help us to get only the targeted data. Read these posts to learn how to collect data using id and class attributes.
Web Scraping - Collecting Data From a Webpage
When to use the getElementsByClassName method
getElementsByClassName Vs getElementById

But sometimes required data might not be associated with any id or class name. In such cases we can use the javascript method “getElementsByTagName'' to collect the required data. Following example shows how to get elements by < a > tags. (anchor tags)

Set Alllinks = objIE.document.getelementsbytagname("A")
For Each Hyperlink In Alllinks
    MsgBox Hyperlink.innertext & "-" & Hyperlink.href
Next

Sometimes we need our VBA web scraping programs to follow the links and collect the data. To do that the program needs to click on < a > tags. Now let’s see how we can accomplish this. To achieve this the program needs to loop through all the < a > tags of the webpage and find the relevant link. Then the VBA program can click the link to navigate to that page. Let’s try to understand this using an example. In this example, the VBA web scraping program will click the link if it finds a link with the text "New Arrivals".

Set Alllinks = objIE.document.getelementsbytagname("A")
For Each Hyperlink In Alllinks
        If InStr(Hyperlink.innertext, "New Arrivals") > 0 Then
               Hyperlink.Click
               Exit For
        End If
Next

When creating VBA web scraping programs to collect data from websites, several built-in functions become imperative. Here are some key VBA functions essential for web scraping programs.

Commonly used VBA functions in web scraping

Mid - This VBA function can be used to extract a string segment from it's parent string.
InStr - This function can be used to check whether certain string is included in another string and get the position of that sub string if it exists.
Replace - This VBA function replaces a character or string segments with another character or string.

Above are the basics you need to know when developing VBA web scraping programs. Follow this complete guide if you want to become an expert in VBA web scraping.

Open and read text file

                         We need to handle text files in various situations. Sometimes we need to create, open, write  or delete text files while the program is running. So today I will explain you how to Open and read text file through VBA.
First of all we need to define the variables.

Dim myFile, text, textline as String

Variable "myFile" is used to store the file path of the text file. Each and every text lines will be stored temporarily to the variable call "textline" while program run throgh the Do loop until it reaches to the end of the file. All the text lines of the text file will be appended to variable "text".


There are two ways to assign the text file to "myFile" variable depending on your requirement.

Either you can use below line of code to assign the file path directly. This method can be used if you always use same folder path and file.

myFile = "D:\Fiverr\amjedamjed\Project 2\all - Copy (11).txt"

Or you can use below line of code to assign the file name to variable. Here user get opportunity to select the file through browsing window.

myFile = Application.GetOpenFilename()

So you should use one of the above two lines in your program according to end user requirement. However rest of the program will not affected by what you chooses from above two ways.  
Below line will open the file for reading

Open myFile For Input As #1

Below do loop will run until it reaches to the end of the text file. Inside the do loop, program read the text file line by line and assign that single line to variable "textline". Whole text of text file is appended to variable call "text" by the next line.
Do Until EOF(1)

    Line Input #1, textline
    text = text & textline
Loop

Below line of code will close the text file which was opened for reading.
Close #1


Now you have whole content of text file in variable "text". Now you can do any string manipulation to that text from here.
However I will just show the content in a message box.

MsgBox text


Here below is the full code for you to easily copy paste for your project



Dim myFile, text, textline as String

myFile = "D:\Fiverr\amjedamjed\Project 2\all - Copy (11).txt"
or
myFile = Application.GetOpenFilename()

Open myFile For Input As #1
Do Until EOF(1)
    Line Input #1, textline
    text = text & textline
Loop
Close #1
MsgBox text