How to Find the XML Sitemap of a Website and Import it Into Google Sheets
How to Find the XML Sitemap of a Website and Import it Into Google Sheets
Zac Cannon
July 3, 2024
How to Find the XML Sitemap of a Website and Import it Into Google Sheets
Contents
What is a sitemap?
How to find the XML sitemap of a website
How to import an XML sitemap into Google Sheets
How to import a large XML sitemap into Google Sheets and fix error resource at url contents exceeded maximum size
What is a Sitemap?
A sitemap is a record of all of the pages on your website. It is a file of information that stores information about its pages, as well as showing the hierarchy of its pages, an RSS feed, and the page’s update frequencies.
A sitemap is needed in order to submit your website pages to search engines, to help them to understand the structure of your site, the relationship between categories and child pages, and ultimately to gain traffic from search engines.
Understanding where to find your website’s website (and your competitors) can be useful for an SEO audit, simply understanding the structure of your site, seeing when new pages or products have been added, or competitor research.
Where to Find the Sitemap of Any Website
Manually check some of the most common locations
/sitemap.xml
/sitemap_index.xml (The index location, If the site has multiple sitemaps)
/sitemap.php
/wp-sitemap.xml (If the site is on Wordpress and not using a plugin to generate)
2. Check the robots.txt file
Depending on how it has been generated, finding the sitemap can sometimes be tricky. It will most often be located at one of the following URLs, but this may depend on whether the webmaster has made efforts to hide it or not.
Beyond the most common locations, the second place to look is the robots.txt file, which may show the address of the website’s sitemaps, providing the robots.txt file is publicly accessible and not redirected.
[link] www.bbc.co.uk/robots.txt [link]
Take the bbc robots.txt file for example, which shows the addresses of the various sitemaps on the site. From here you can navigate into each individual sitemap to see the various pages on the site.
Found your Sitemap? Here’s How to Import it into Google Sheets
Finding your own sitemap is useful in order to see exactly which pages are on your site. In order to see landing pages that may have been created, or if you’re working on a site for a business or organisation, any new products or services that have been launched.
Even better, you can use a sitemap to spy on your competitors [link] how to extract brand from url [link].
How to Import Page URLs from your Sitemap into Google Sheets
To import a sitemap into Google Sheets you can use an ImportXML formula. For this you’ll need an xpath query along with the url of the sitemap.
IMPORTXML(url, xpath_query)
Xpath query: "//*[local-name() = 'url']”
=importxml ("https://www.versusking.com/sitemap.xml","//*[local-name() = 'url']")
This will import all of the text under the child of the <url> element
How to Import a Large XML Sitemap into Google Sheets and Fix Error Resource at URL Contents Exceeded Maximum Size
If you’re working with a large website with potentially thousands of URLs, then you may get hit with the error message “Resource at url contents exceeded maximum size”.
Sitemaps can contain up to a maximum of 50,000 URLs and can’t be more than 50MB in size. However IMPORTXML does seem to far break below this limit, and I’ve seen it breaking with XML sitemaps as low as 10,000. Fortunately I’ve managed to overcome this limitation. One way I’ve found is to parse the XML yourself using Google AppScript with XMLService. This will work providing there is nothing blocking you crawling the site
See below code snippet to parse an XML sitemap using AppScript and import the URLs into Google Sheets.
The script works by effectively taking the whole sitemap, fetching the text of the <loc> element which is always at position 0 in the <url> array. It could also be amended to fetch the <lastmod> element or any other elements you wanted as well. It then calls another function to push the urls to a Google Sheet of your choosing after overwriting any old data.
function requestXMLSitemap() {
const url = 'INSERT_SITEMAP_URL' // place your sitemap URL here
const options = {
header: {
'Content-Type': 'application/xml'
},
method: 'GET'
}
const response = UrlFetchApp.fetch(url, options).getContentText()
const xml = XmlService.parse(response)
const root = xml.getRootElement();
const children = root.getChildren();
const data = [];
for (let item in children){
let url = children[item].getChildren()[0].getText();
data.push([url]);
}
pushToSheet(data);
}
function pushToSheet(data) {
// settings
const settings = {
url: 'INSERT_SPREADSHEET_URL_HERE', // place your spreadsheet URL here
sheet: 'INSERT SHEET NAME HERE', // place the sheet name here
}
const ss = SpreadsheetApp.openByUrl(settings.url);
const sheet = ss.getSheetByName(settings.sheet);
sheet.getRange("A2:A").clearContent(); // place the range to push urls to here
sheet.getRange(2,1,data.length,data[0].length).setValues(data) // export data to sheet
}
Summary
Having a working sitemap is fundamental for SEO, so that search engines can read and understand your website, thereby driving organic traffic to your website. Being able to find the sitemap of any website is therefore incredibly useful for understanding which pages are on the site, having an up to date record of the pages, and even auditing your competitors websites.
How to Find the XML Sitemap of a Website and Import it Into Google Sheets
How to Find the XML Sitemap of a Website and Import it Into Google Sheets
Zac Cannon
July 3, 2024
How to Find the XML Sitemap of a Website and Import it Into Google Sheets
Contents
What is a sitemap?
How to find the XML sitemap of a website
How to import an XML sitemap into Google Sheets
How to import a large XML sitemap into Google Sheets and fix error resource at url contents exceeded maximum size
What is a Sitemap?
A sitemap is a record of all of the pages on your website. It is a file of information that stores information about its pages, as well as showing the hierarchy of its pages, an RSS feed, and the page’s update frequencies.
A sitemap is needed in order to submit your website pages to search engines, to help them to understand the structure of your site, the relationship between categories and child pages, and ultimately to gain traffic from search engines.
Understanding where to find your website’s website (and your competitors) can be useful for an SEO audit, simply understanding the structure of your site, seeing when new pages or products have been added, or competitor research.
Where to Find the Sitemap of Any Website
Manually check some of the most common locations
/sitemap.xml
/sitemap_index.xml (The index location, If the site has multiple sitemaps)
/sitemap.php
/wp-sitemap.xml (If the site is on Wordpress and not using a plugin to generate)
2. Check the robots.txt file
Depending on how it has been generated, finding the sitemap can sometimes be tricky. It will most often be located at one of the following URLs, but this may depend on whether the webmaster has made efforts to hide it or not.
Beyond the most common locations, the second place to look is the robots.txt file, which may show the address of the website’s sitemaps, providing the robots.txt file is publicly accessible and not redirected.
[link] www.bbc.co.uk/robots.txt [link]
Take the bbc robots.txt file for example, which shows the addresses of the various sitemaps on the site. From here you can navigate into each individual sitemap to see the various pages on the site.
Found your Sitemap? Here’s How to Import it into Google Sheets
Finding your own sitemap is useful in order to see exactly which pages are on your site. In order to see landing pages that may have been created, or if you’re working on a site for a business or organisation, any new products or services that have been launched.
Even better, you can use a sitemap to spy on your competitors [link] how to extract brand from url [link].
How to Import Page URLs from your Sitemap into Google Sheets
To import a sitemap into Google Sheets you can use an ImportXML formula. For this you’ll need an xpath query along with the url of the sitemap.
IMPORTXML(url, xpath_query)
Xpath query: "//*[local-name() = 'url']”
=importxml ("https://www.versusking.com/sitemap.xml","//*[local-name() = 'url']")
This will import all of the text under the child of the <url> element
How to Import a Large XML Sitemap into Google Sheets and Fix Error Resource at URL Contents Exceeded Maximum Size
If you’re working with a large website with potentially thousands of URLs, then you may get hit with the error message “Resource at url contents exceeded maximum size”.
Sitemaps can contain up to a maximum of 50,000 URLs and can’t be more than 50MB in size. However IMPORTXML does seem to far break below this limit, and I’ve seen it breaking with XML sitemaps as low as 10,000. Fortunately I’ve managed to overcome this limitation. One way I’ve found is to parse the XML yourself using Google AppScript with XMLService. This will work providing there is nothing blocking you crawling the site
See below code snippet to parse an XML sitemap using AppScript and import the URLs into Google Sheets.
The script works by effectively taking the whole sitemap, fetching the text of the <loc> element which is always at position 0 in the <url> array. It could also be amended to fetch the <lastmod> element or any other elements you wanted as well. It then calls another function to push the urls to a Google Sheet of your choosing after overwriting any old data.
function requestXMLSitemap() {
const url = 'INSERT_SITEMAP_URL' // place your sitemap URL here
const options = {
header: {
'Content-Type': 'application/xml'
},
method: 'GET'
}
const response = UrlFetchApp.fetch(url, options).getContentText()
const xml = XmlService.parse(response)
const root = xml.getRootElement();
const children = root.getChildren();
const data = [];
for (let item in children){
let url = children[item].getChildren()[0].getText();
data.push([url]);
}
pushToSheet(data);
}
function pushToSheet(data) {
// settings
const settings = {
url: 'INSERT_SPREADSHEET_URL_HERE', // place your spreadsheet URL here
sheet: 'INSERT SHEET NAME HERE', // place the sheet name here
}
const ss = SpreadsheetApp.openByUrl(settings.url);
const sheet = ss.getSheetByName(settings.sheet);
sheet.getRange("A2:A").clearContent(); // place the range to push urls to here
sheet.getRange(2,1,data.length,data[0].length).setValues(data) // export data to sheet
}
Summary
Having a working sitemap is fundamental for SEO, so that search engines can read and understand your website, thereby driving organic traffic to your website. Being able to find the sitemap of any website is therefore incredibly useful for understanding which pages are on the site, having an up to date record of the pages, and even auditing your competitors websites.
How to Find the XML Sitemap of a Website and Import it Into Google Sheets
How to Find the XML Sitemap of a Website and Import it Into Google Sheets
Zac Cannon
July 3, 2024
How to Find the XML Sitemap of a Website and Import it Into Google Sheets
Contents
What is a sitemap?
How to find the XML sitemap of a website
How to import an XML sitemap into Google Sheets
How to import a large XML sitemap into Google Sheets and fix error resource at url contents exceeded maximum size
What is a Sitemap?
A sitemap is a record of all of the pages on your website. It is a file of information that stores information about its pages, as well as showing the hierarchy of its pages, an RSS feed, and the page’s update frequencies.
A sitemap is needed in order to submit your website pages to search engines, to help them to understand the structure of your site, the relationship between categories and child pages, and ultimately to gain traffic from search engines.
Understanding where to find your website’s website (and your competitors) can be useful for an SEO audit, simply understanding the structure of your site, seeing when new pages or products have been added, or competitor research.
Where to Find the Sitemap of Any Website
Manually check some of the most common locations
/sitemap.xml
/sitemap_index.xml (The index location, If the site has multiple sitemaps)
/sitemap.php
/wp-sitemap.xml (If the site is on Wordpress and not using a plugin to generate)
2. Check the robots.txt file
Depending on how it has been generated, finding the sitemap can sometimes be tricky. It will most often be located at one of the following URLs, but this may depend on whether the webmaster has made efforts to hide it or not.
Beyond the most common locations, the second place to look is the robots.txt file, which may show the address of the website’s sitemaps, providing the robots.txt file is publicly accessible and not redirected.
[link] www.bbc.co.uk/robots.txt [link]
Take the bbc robots.txt file for example, which shows the addresses of the various sitemaps on the site. From here you can navigate into each individual sitemap to see the various pages on the site.
Found your Sitemap? Here’s How to Import it into Google Sheets
Finding your own sitemap is useful in order to see exactly which pages are on your site. In order to see landing pages that may have been created, or if you’re working on a site for a business or organisation, any new products or services that have been launched.
Even better, you can use a sitemap to spy on your competitors [link] how to extract brand from url [link].
How to Import Page URLs from your Sitemap into Google Sheets
To import a sitemap into Google Sheets you can use an ImportXML formula. For this you’ll need an xpath query along with the url of the sitemap.
IMPORTXML(url, xpath_query)
Xpath query: "//*[local-name() = 'url']”
=importxml ("https://www.versusking.com/sitemap.xml","//*[local-name() = 'url']")
This will import all of the text under the child of the <url> element
How to Import a Large XML Sitemap into Google Sheets and Fix Error Resource at URL Contents Exceeded Maximum Size
If you’re working with a large website with potentially thousands of URLs, then you may get hit with the error message “Resource at url contents exceeded maximum size”.
Sitemaps can contain up to a maximum of 50,000 URLs and can’t be more than 50MB in size. However IMPORTXML does seem to far break below this limit, and I’ve seen it breaking with XML sitemaps as low as 10,000. Fortunately I’ve managed to overcome this limitation. One way I’ve found is to parse the XML yourself using Google AppScript with XMLService. This will work providing there is nothing blocking you crawling the site
See below code snippet to parse an XML sitemap using AppScript and import the URLs into Google Sheets.
The script works by effectively taking the whole sitemap, fetching the text of the <loc> element which is always at position 0 in the <url> array. It could also be amended to fetch the <lastmod> element or any other elements you wanted as well. It then calls another function to push the urls to a Google Sheet of your choosing after overwriting any old data.
function requestXMLSitemap() {
const url = 'INSERT_SITEMAP_URL' // place your sitemap URL here
const options = {
header: {
'Content-Type': 'application/xml'
},
method: 'GET'
}
const response = UrlFetchApp.fetch(url, options).getContentText()
const xml = XmlService.parse(response)
const root = xml.getRootElement();
const children = root.getChildren();
const data = [];
for (let item in children){
let url = children[item].getChildren()[0].getText();
data.push([url]);
}
pushToSheet(data);
}
function pushToSheet(data) {
// settings
const settings = {
url: 'INSERT_SPREADSHEET_URL_HERE', // place your spreadsheet URL here
sheet: 'INSERT SHEET NAME HERE', // place the sheet name here
}
const ss = SpreadsheetApp.openByUrl(settings.url);
const sheet = ss.getSheetByName(settings.sheet);
sheet.getRange("A2:A").clearContent(); // place the range to push urls to here
sheet.getRange(2,1,data.length,data[0].length).setValues(data) // export data to sheet
}
Summary
Having a working sitemap is fundamental for SEO, so that search engines can read and understand your website, thereby driving organic traffic to your website. Being able to find the sitemap of any website is therefore incredibly useful for understanding which pages are on the site, having an up to date record of the pages, and even auditing your competitors websites.
How to Find the XML Sitemap of a Website and Import it Into Google Sheets
How to Find the XML Sitemap of a Website and Import it Into Google Sheets
Zac Cannon
July 3, 2024
How to Find the XML Sitemap of a Website and Import it Into Google Sheets
Contents
What is a sitemap?
How to find the XML sitemap of a website
How to import an XML sitemap into Google Sheets
How to import a large XML sitemap into Google Sheets and fix error resource at url contents exceeded maximum size
What is a Sitemap?
A sitemap is a record of all of the pages on your website. It is a file of information that stores information about its pages, as well as showing the hierarchy of its pages, an RSS feed, and the page’s update frequencies.
A sitemap is needed in order to submit your website pages to search engines, to help them to understand the structure of your site, the relationship between categories and child pages, and ultimately to gain traffic from search engines.
Understanding where to find your website’s website (and your competitors) can be useful for an SEO audit, simply understanding the structure of your site, seeing when new pages or products have been added, or competitor research.
Where to Find the Sitemap of Any Website
Manually check some of the most common locations
/sitemap.xml
/sitemap_index.xml (The index location, If the site has multiple sitemaps)
/sitemap.php
/wp-sitemap.xml (If the site is on Wordpress and not using a plugin to generate)
2. Check the robots.txt file
Depending on how it has been generated, finding the sitemap can sometimes be tricky. It will most often be located at one of the following URLs, but this may depend on whether the webmaster has made efforts to hide it or not.
Beyond the most common locations, the second place to look is the robots.txt file, which may show the address of the website’s sitemaps, providing the robots.txt file is publicly accessible and not redirected.
[link] www.bbc.co.uk/robots.txt [link]
Take the bbc robots.txt file for example, which shows the addresses of the various sitemaps on the site. From here you can navigate into each individual sitemap to see the various pages on the site.
Found your Sitemap? Here’s How to Import it into Google Sheets
Finding your own sitemap is useful in order to see exactly which pages are on your site. In order to see landing pages that may have been created, or if you’re working on a site for a business or organisation, any new products or services that have been launched.
Even better, you can use a sitemap to spy on your competitors [link] how to extract brand from url [link].
How to Import Page URLs from your Sitemap into Google Sheets
To import a sitemap into Google Sheets you can use an ImportXML formula. For this you’ll need an xpath query along with the url of the sitemap.
IMPORTXML(url, xpath_query)
Xpath query: "//*[local-name() = 'url']”
=importxml ("https://www.versusking.com/sitemap.xml","//*[local-name() = 'url']")
This will import all of the text under the child of the <url> element
How to Import a Large XML Sitemap into Google Sheets and Fix Error Resource at URL Contents Exceeded Maximum Size
If you’re working with a large website with potentially thousands of URLs, then you may get hit with the error message “Resource at url contents exceeded maximum size”.
Sitemaps can contain up to a maximum of 50,000 URLs and can’t be more than 50MB in size. However IMPORTXML does seem to far break below this limit, and I’ve seen it breaking with XML sitemaps as low as 10,000. Fortunately I’ve managed to overcome this limitation. One way I’ve found is to parse the XML yourself using Google AppScript with XMLService. This will work providing there is nothing blocking you crawling the site
See below code snippet to parse an XML sitemap using AppScript and import the URLs into Google Sheets.
The script works by effectively taking the whole sitemap, fetching the text of the <loc> element which is always at position 0 in the <url> array. It could also be amended to fetch the <lastmod> element or any other elements you wanted as well. It then calls another function to push the urls to a Google Sheet of your choosing after overwriting any old data.
function requestXMLSitemap() {
const url = 'INSERT_SITEMAP_URL' // place your sitemap URL here
const options = {
header: {
'Content-Type': 'application/xml'
},
method: 'GET'
}
const response = UrlFetchApp.fetch(url, options).getContentText()
const xml = XmlService.parse(response)
const root = xml.getRootElement();
const children = root.getChildren();
const data = [];
for (let item in children){
let url = children[item].getChildren()[0].getText();
data.push([url]);
}
pushToSheet(data);
}
function pushToSheet(data) {
// settings
const settings = {
url: 'INSERT_SPREADSHEET_URL_HERE', // place your spreadsheet URL here
sheet: 'INSERT SHEET NAME HERE', // place the sheet name here
}
const ss = SpreadsheetApp.openByUrl(settings.url);
const sheet = ss.getSheetByName(settings.sheet);
sheet.getRange("A2:A").clearContent(); // place the range to push urls to here
sheet.getRange(2,1,data.length,data[0].length).setValues(data) // export data to sheet
}
Summary
Having a working sitemap is fundamental for SEO, so that search engines can read and understand your website, thereby driving organic traffic to your website. Being able to find the sitemap of any website is therefore incredibly useful for understanding which pages are on the site, having an up to date record of the pages, and even auditing your competitors websites.
We grow e-commerce brands.
Get in touch: sales@vida-digital.co.uk
Vida Digital Marketing Limited
Registered in England and Wales
Company number: 14162188
© Copyright 2024
Vida Digital Marketing Limited
Crafted by kreated
We grow e-commerce brands.
Get in touch: sales@vida-digital.co.uk
Vida Digital Marketing Limited
Registered in England and Wales
Company number: 14162188
© Copyright 2024
Vida Digital Marketing Limited
Crafted by kreated
We grow e-commerce brands.
Get in touch: sales@vida-digital.co.uk
Vida Digital Marketing Limited
Registered in England and Wales
Company number: 14162188
© Copyright 2024
Vida Digital Marketing Limited
Crafted by kreated
We grow e-commerce brands.
Get in touch: sales@vida-digital.co.uk
Vida Digital Marketing Limited
Registered in England and Wales
Company number: 14162188
© Copyright 2024
Vida Digital Marketing Limited
Crafted by kreated