Family Magazine

Extracting URLs from Page Code in Excel

By Geoff Griffiths @mmatraining1980

Removing Code from the End of the URLs

If the links don’t have .html in the protocol (at the end of them), then first concatenate the URLs to add * at the end of them all.

If they do have .HTML, then find and replace .HTML with .html*

Next – use Text to Columns, use * as the “delimiter”

Removing code from the Start of the URLs

If the links are relative:

Use text to columns again and use / as the delimiter

If the links contain the full domain name, then concatenate * at the start of the domain name and use text to columns again to remove the code fro the start of the URLs

Alternative Method

Here is another method for doing it – https://businessdaduk.com/data/data-analysis-with-excel/getting-domains-from-a-list-of-urls/

Ensure all the domains have http:// prefix

Use the formula:

=IF(ISERROR(FIND("//www.",A2)), MID(A2,FIND(":",A2,4)+3,FIND("/",A2,9)-FIND(":",A2,4)-3), MID(A2,FIND(":",A2,4)+7,FIND("/",A2,9)-FIND(":",A2,4)-7))

Back to Featured Articles on Logo Paperblog