Web pages have various HTML tags. So the data we want to collect can be contained in any of these tags. Because of this you will need to carefully examine the HTML structure of the webpages before develop codes.
Web browsers have special tool which helps us to look at the structure of the web pages very easily. For an example, think you need to collect names of the people which is shown in a webpages. In this example each name is shown in separate page. But structure of each page is identical. So what you need to do first is, take your cursor near the name. Then write click. You will see this kind of options.
Then click on the Inspect.
You will able to see the structure of the web-page. Assume the code relate to name is look like this.
<span itemprop="full name" class="full name block">James Smith</span>
So I will show you how to collect name from this kind of structure. Here we can extract name using class name as follows.
objIE.document.getElementsByClassName("name block")(0).innerText |
objIE should be defined before like
Set objIE = CreateObject("InternetExplorer.Application") |
If you are not familiar with those basic web scraping, you can learn them from my earlier post.
So let's now look at how to extract data from hyperlinks. Assume telephone numbers are stored in a hyperlink like below.
<a class="click-link" data-gaaction="ourbusiness" data-gaevent="pho_number" href="tel:9128602884" itemprop="phone">912-860-2884</a>
We can use below type of VBA code to get the telephone number.
Dim telephone_number As String
Set Alllinks = objIE.document.getElementsByTagName("A")
For Each Hyperlink In Alllinks
If InStr(1, Hyperlink.href, "tel:", vbTextCompare) > 0 Then
telephone_number = Hyperlink.innerText
Exit For
End If
Next
|
Also we can use following technique to get text inside div tags.
Dim AllString As String Set AllDiv = objIE.document.getElementsByTagName("div") For Each DivTag In AllDiv AllString = DivTag.innerText Next |
Then we can use functions like InStr, Split etc.. to extract relavent information from that "AllString" variable.