You may have noticed the tagline on our home page that says “From inventory to insight.” In this series of articles, we’ll start with how you can begin the process of content auditing right in the CAT interface and build it out further when you export your results. This first article focuses on the basic information that every inventory starts with; in later articles we’ll add data for additional types of audits—what information to add and how to evaluate a site against various audit criteria.
When starting a content inventory and audit process, many people look for examples—templates—on which to base their own work. This isn’t a new process, after all, and many others have done them before and have shared their own methodologies and templates. See, for example, the Templates section of the Content Inventory and Audit Resources site. You’ll notice that many of the examples contain essentially the same set of columns and advice.
If you begin your content inventory and audit using our content inventory tool, CAT, though, you don’t need to go out and find someone else’s spreadsheet template. The tool’s dashboard itself gives you a lot of rich information in an easily browsed view. Or if you decide to export the results of a crawl, you already have the basics there in your export. Open it up in Excel and voila! You have your content inventory template, ready to be supplemented with the additional site- or project-specific information you wish to track.
When CAT crawls your site, it returns all of this data, which can be sorted and filtered as you need:
Plus, you can add:
With all that data already populated in the CAT interface, you have a head start on your audit.
Depending on the size, length, and complexity of your content project, you may be able to do your auditing right in the CAT interface—or at least identify the issues you want to delve into further.
The job details view, for example, provides a summary of the count of content types, which as Scott Pierce noted in his recent guest post on using inventories in project scoping, can trigger various questions or conclusions. The detail view also allows you to filter and find specific content types or filter on status codes to find broken (404) pages or redirects (ideally, for SEO benefit, your site is using 301 redirects rather than 302). Viewing the detailed list, you can see (and sort by) the URL, type (format), the site level of each page, and title (is the title field blank? If so, you’ve found a page that’s missing its meta-title and you know you need to go add it.)
Example of information available in the job details view
Click off to a resource details page to see page-level specifics—all the metadata (title, description, keywords); clickable lists of images, audio, video, and document files associated with the page; clickable lists of links in and out. Quickly review the metadata against your site standards, find pages that may not have any images and consider whether you might want to liven them up a little, check the balance of links in and out. Is this an important content page but only a few pages link to it? Consider whether you want to beef up your cross-linking strategy. Use the Notes field to add any comments you have about the page or to track content owners, content types, or review status. This field is part of your export, so you will have it in aggregate in your spreadsheet.
If you do need to add more information in order to complete your audit process, you can quickly and easily export your crawl’s results into Excel. Let’s walk through what’s available in the export and how it can be used to begin your audit process.
Snippet of a CAT inventory export showing available columns and data
Looking at the URL structure allows you to evaluate several things:
Length and clarity: For both human readability and search engine optimization (SEO), shorter URLs are better. Very long URLs may not be rendered by some browsers and they certainly won’t be memorable to a human who may later want to directly type it in. It’s also best practice to use hyphens (rather than underscores or blank spaces) between words in URLs—a quick look at the URL list will help you identify whether your URLs follow this practice.
URLs that are composed of session IDs or other parameters provide no information to the user to help set expectations of the content likely to exist at that location. Multiple parameters may also affect whether a page is crawled by search engines like Google, too, so again, identifying and addressing poorly-constructed URLs is not only a favor to your human users but gives you the opportunity to improve your site’s ranking.
Navigational structure: It is common to use a content inventory as the basis of a hierarchical site map. If the URLs represent a logical directory structure, you have a great start at creating that map.
The type, or format, of the content—for example, HTML, video, image—is another basic piece of information to identify the overall structure and content mix of your site. Does your site include a large number of PDFs? You may want to flag those for review and/or incorporation into the site in a more usable (and indexable!) way. Are there videos? Another content type to review for relevance and currency.
This data may interest your web management team, who care about the size of pages and their effect on load time and performance.
Although keywords have declined in importance for SEO, the title and description are still very important. The title appears in the browser as well as in search results, so it’s important that it be unique and descriptive (get those keywords in there!) without being too long. Best practice is 70 or fewer characters.
The description also appears in search engine results, so you will want to review it to see how well it actually represents the content on the page and is engaging or informative enough to entice readers to click through to the page.
Although your CAT export doesn’t provide the URLs of each page linking to and out from each page (you can imagine how listing all that data, per page, for a large site could pretty quickly tax the limits of Excel’s row counts!), it does provide the numbers, which allows you to assess how heavily cross-linked your pages are. And if you want the detail, go back into the resource details view of CAT to see the actual URLs. And note that you can export at the resource details level, which does include all the URLs, which allows you to know, for example, all the places where you might need to update a link if a page moves or is deleted.
Similarly to the links in and out, your export lists the numbers of each of these files associated with each page. And again, this gives you the chance to do a little number-crunching to see the ratio of files to pages or sort to identify pages that have no images at all.
If you configured your crawl to include your site analytics, the data for Pageviews, Bounce Rate and Exit Percentage are included in your export. Learn how to add Google Analytics to your crawl.
Any custom columns you created and the values you placed in them are included so you don't need to add them in Excel. Use these to track additional information you need about your content—business owner, status, page template, and so on. Learn how to set up custom columns.
The Notes column retains any of the specifics you added at the page level (in the resource details view). If you paged through the resource details and made comments about the content for later reference, you can view this column to remind yourself of your impressions as you reviewed it.
You’ve now seen how the information provided in the CAT interface and the .csv export forms a solid basis for an initial content audit. In the following articles in this series, we’ll cover how to expand on your basic inventory and audit. We’ll look at what information to add and how to evaluate it to assess your site’s content performance, do a competitive analysis, audit against personas, and more.