User Guide

Introduction to ContentWRX Audit
Job Setup
The Dashboard
Job Details View
Resource Detail View
Comparing Jobs
Exporting Job Data

Introduction to ContentWRX Audit

ContentWRX Audit crawls web sites and returns data for further analysis, enabling a wide variety of activities, from content management, to data mining, to business intelligence, to snapshot-in-time, and more. The content inventories created by ContentWRX Audit can be viewed from within the dashboard or exported as a .csv file suitable for further analysis in tools such as Excel.

ContentWRX Audit is a web-based software-as-a-service solution, so there is nothing to download or install. Simply go to the Pricing Plans page, set up an account, select your subscription level, and get started.

ContentWRX Audit allows you to set up jobs and fine tune results by telling the crawler exactly what URL paths and patterns to follow and what data to return for each URL fetched.

The Dashboard view gives you easy access to view what's in your job queue and your list of completed jobs, and allows you to take a number of actions, including viewing all job data, adding custom columns and tagging files, re-running a job, or deleting it.

Key features of ContentWRX Audit include

  • Page-level details for each resource crawled, including associated images and media, metadata, H1 tag text, word count, and links in and links out
  • Detailed job comparisons
  • Screenshots of each page as it appeared at the time of the crawl
  • Ability to view the images associated with each page
  • Filtered exports of page and comparison detail
  • Integration with Google Analytics
  • Ability to add custom columns and tag files with controlled vocabularies

Setting Up a New Job

In ContentWRX Audit, a site crawl is referred to as a Job. To set up a new job, select the Job Setup tab.


The Job Setup tab


Setting up a Project allows you to group multiple jobs, similar to files in a folder. For example, you may have a project for each web site you inventory or for each client. It is not required that you create a project for each job, but it is useful for organizing multiple crawls.

Your Project names will be retained in a project list. Once you have more than one, a dropdown will allow you to select a project to which to add any new jobs.

Job Details

Each job is an individual crawl. To set up a job, give it a name, a description, and a base URL from which to start.

Setting the Base URL

The first step in setting up a job (or crawl) in ContentWRX Audit is setting the base URL from which ContentWRX Audit will start the crawl.

Before you enter the URL in ContentWRX Audit, enter it in a browser and make sure it's valid and that it does not redirect. If it redirects to another URL immediately, you'll need to enable redirects (see below).
ContentWRX Audit will take that URL pattern literally—meaning that unless you tell it otherwise via the advanced settings, it will catalog URLs of that same base pattern. That means that if your site includes sub-domains of a different pattern, you will need to include those in the Include Links box if you want them included in your crawl.


If Follow redirects is selected, the crawler traverses redirects for the link. If not selected, the crawler records that the link was redirected but doesn't traverse and return data.

Exclude External Links

When Exclude external links is selected, if a link points outside the domain of the base URL and the included links you designate it will never be followed. If this box is unchecked, however, the server will return information about the resource the link points to, such as server status (for example, 200 “OK”), resource type (“text/html,” or “image/png,” for example) and other data. If checked, links that are out of scope are ignored. Note: Checking this box can speed up your crawl.

External resources are never fetched.

Include Links, Exclude Links

Include Links is a list of link patterns you wish to have crawled in addition to the Base URL. Enter link patterns or fragments here, separated by spaces.

In Include Links, shorter URL strings increase the likelihood of matches and will return more results.

Exclude Links tells the crawler which paths to ignore, allowing you to fine-tune your results.

If your site includes sections that are on a different domain (and therefore the URLs don't match the Base URL pattern) add those sub-domains in the Include Links box if you want them included in your crawl.

Your setup would look like this:

Base URL:
Include Links:

To exclude particular directories or sub-domains, list them in the Exclude Links box. For example, if you are crawling an e-commerce site and don't want hundreds or thousands of product pages returned, add that URL pattern to the Exclude Links.

Limiting Crawls to a Specific Directory

Sometimes you may wish to crawl only a specific directory within your site. ContentWRX Audit makes that possible, but you do need to be careful in how you set up your job parameters. Set the directory as your Base URL, but also add it to the Include Links box and add an asterisk (*) to the Exclude Links box so no other sections are crawled. For example, if you wanted to crawl just the Resources section of, your setup would look like this:

Base URL:
Include Links:
Exclude Links: *

ContentWRX Audit does not support wildcard matching. Use of the asterisk is supported only when used as shown above and only to exclude everything other than what is encompassed in the Base URL + Include Links scope.

Include Screenshots

If Include Screenshots is selected, ContentWRX Audit will generate and store a snapshot-in-time of each HTML page. The images are viewable in the Resource Details view and can be downloaded by opening in a browser window and saving. 

Including screenshots may cause the job to take longer to complete. Images will be captured as soon as possible, but may be captured after the crawl itself has completed.

Maximum Pages

Your subscription level limits the number of pages ContentWRX Audit will crawl within the subscription period. If you wish to set a maximum for a particular crawl, enter the page limit you wish to set in the Maximum Pages field. The crawl begins at the top level of the base URL and each link is followed the first time it is detected (in order to avoid duplicates). When the limit is reached, the crawl will stop. Indication that the maximum number of pages was reached will be indicated in the Job Queue.

You can always purchase more pages and storage to supplement your subscription level. See the Pricing page for details and options.

Google Analytics

If there is a Google Analytics account associated with the site you are crawling, you can grant ContentWRX Audit access to that data to gather and display in the job details and resource details. Including this data in your ContentWRX Audit job data is simple, but requires a few extra steps to get set up.

1. Add ContentWRX Audit as a user

In order for ContentWRX Audit to gather the analytics data, you need to set ContentWRX Audit up as a user in your account profile. Follow these steps:

  • Log in to your Google Analytics Account
  • Click on the Admin link in the bar at the top of the page
  • In the Account column, click User Management. 
  • In the "Add permissions for:" field, add this email as a new user:

    NOTE: If you have previously set up Google Analytics access, you do not need to change this email account. Use your existing account setup.


2. Get the View ID

  • From the Admin landing page, select View Settings from the View column
  • Find the View ID value under Basic Settings
  • Copy the value
  • Enter that value into the View ID field in Job Setup


Be sure that the Base URL of your job is exactly the same as the URL for the Google Analytics account.

The Dashboard

The ContentWRX Audit dashboard tab is your console for reviewing and managing your in-progress and completed inventory jobs. From this tab, you can view the job queue, access completed job information, select jobs for comparison and navigate to the results, modify and re-run jobs, archive jobs, and delete completed jobs.


The ContentWRX Audit dashboard

Job Queue

The Job Queue lists jobs that are scheduled or running, shows the status of each job in progress, and allows you to cancel jobs if they have not completed.

Canceling a job means that any data that has been gathered will be deleted and no longer accessible.
When a job has finished running, it will appear in the Completed Jobs section, organized by run date (with most recent jobs at the top of the list), then project name.

Completed Jobs List

The complete jobs list allows you to view the project a job is assigned to, the name of the job, the description, and run date, as well as select from a set of actions.



In the Completed Jobs List, you can view the results of a completed job by clicking the Open icon.

You can also select two jobs for comparison.

clone_icon.png Clone

Cloning a job allows you to copy the job, modify parameters, and re-run the job. Selecting Clone will open the Job Setup view. Make necessary changes to the job parameters and click Submit.

rerun_icon.png Re-Run

Re-run is a quick way to recreate exactly the existing job and start a new job without requiring routing through Job Setup.

edit_icon.png Edit

Edit allows you to easily move a job to a different project, rename it, or add or modify the description. Click the Edit icon, make your changes, and click the Save icon to save your changes.

delete_icon.png Delete

Deleting a job will remove it from the list and delete all data.

Job Details View

When a job has completed, it can be viewed by clicking the Open icon open_icon.png from the Actions column.


Job Summary and Details view

Job Summary

The Job Summary lists the total number of files found in the crawl, by type.


The filters affect the list of files shown in Job Detail list. If no filters are selected, all files are shown. Check and uncheck the boxes next to the types to limit the results below.


From the Completed Job view, a number of actions can be taken on the data:


Selecting export allows you to download the crawl data as a comma-separated .csv file for import into another program, such as Excel, for further manipulation. See Exporting Job Data, below, for more detail.

View Job Parameters

View Job Parameters takes you back to the Job Setup view, in read-only mode, so you can review how the job was set up.


Re-run allows you to re-run the job exactly as configured.


Cloning a job allows you to copy the job, modify parameters, and re-run the job. Selecting Clone will open the Job Setup view allowing you to change any of the settings before re-running.


Deleting a job will remove it from the list and delete all data.

Edit View

To change the set of columns that appears in Job Detail view, click Edit View from the Actions menu. Checkboxes appear next to the columns that can be hidden; uncheck the ones you wish to hide and click Save View.

Custom Columns

Create up to three custom columns and fill with your own tags. You can edit directly in the cells or create a set of values; values will appear in a drop-down selector in the cells.

To add custom columns and vocabularies:

  • Click Custom Columns from the Actions menu
  • In the module that opens, create up to three columns, give them labels, and add a list of values for the column
  • Click the green + button and continue to add rows as needed
  • Click Save and your new columns will appear in your Job Detail view
  • To add a value to a cell, click into it. A drop-down will appear with the values you made available for that column. 
  • Select a value and move on to the next

To view or edit custom column values in Resource Details, see the Custom Tags and Notes section. There you can view or change the values set in Job Detail or add values if you haven’t previously.

You do not have to create a set of values. You can also edit directly within the cells of the Job Detail table.

Job Details

The Job Detail list includes the following data:

  • URL - The resource address
  • Type - The MIME type of the resource
  • Size - Resource file size
  • Level -  Level relative to the BaseURL indicating crawl depth (links traversed, not URL path depth)
  • Title - Extracted from the HTML header
  • Custom Columns - If you've created your own additional columns they appear here
  • Analytics columns - The data for Google Analytics fields: Pageviews, % Exit, Bounce Rate, Unique Pageviews, Average Time on Page, Entrances
  • InScope - Notes whether a resource is in scope for the crawl (true) or not

In-scope resources are those that fall within the parameters set by the combination of a base URL and any include patterns, minus exclude patterns. For these resources, we download and process the HTML for metadata, images and other media, and links in and out.

Links to resources outside this path are recorded (if Ignore External Links is not checked), but HTML is not downloaded or processed and screenshots are not captured. These resources are considered out-of-scope.

To view the details of a listed resource, click the green arrow at the end of the row. Resource Detail View opens.

Resource Detail View


Resource detail view

In the Resource Detail view, if you chose to include screenshots in Job Setup, you will see a snapshot-in-time of the page accompanied by all the details captured during the crawl.

Images will be captured as soon as possible, but may be captured after the crawl itself has completed.

The following data is available in this view:

  • URL - The resource address
  • Date - Last updated date (extracted from the HTML header)
  • Size - Resource file size
  • Date - Last updated date (extracted from the HTML header)
  • Scan Status - Indicates whether the scan of the page completed successfully
  • Server Status Code - The code returned by the server for the resource; for example, 200 means that the request to return the page was successful.
  • Title - The page meta-title as extracted from the HTML metadata
  • Keywords - Extracted from HTML metadata
  • Description -  Extracted from HTML metadata
  • H1 tag - Extracted from HTML metadata
  • Word count - Count of non-HTML words on the page
  • Analytics - If analytics data was enabled for the job, it appears here
  • Images - Lists the images found in the page (TIP: Click on an image file name to open the image in a new browser window)
  • Audio - Any audio files associated with the page
  • Videos - Lists any videos associated with the page
  • Custom column data - If set up in Job Detail view, columns and their values are visible here; values can be added or edited here as well
  • Notes field for adding your own notes
  • Links in - Lists in-bound links to the page
  • Links out -Lists outward-bound links from the page

Comparing Jobs

A key feature of ContentWRX Audit is the ability to compare one completed job to another and see what has changed, been added, or deleted. Select jobs for comparison by clicking the checkboxes in the Compare column and clicking 'Compare selected jobs.'


The Job Comparison screen will open.

The Job Summary indicates the two jobs being compared and a summary of the changed files.

The file list shows original and changed, added, or deleted files. To view changes, click the green arrow to the right of the original file to see the comparison results in detail.

Exporting Job Data

If you wish to export job data from ContentWRX Audit for further manipulation in another program, such as Excel, select Export from the Job View. The .csv file that downloads contains the following data:

  • URL - The resource address
  • Type - The MIME type of the resource
  • Size - Resource file size
  • Date - Last updated date (extracted from the HTML header)
  • Title - Extracted from HTML metadata
  • Keywords - Extracted from HTML metadata
  • Description - Extracted from HTML metadata
  • H1 tag text - Extracted from HTML metadata
  • Word count - Extracted from the page HTML
  • Analytics - If included in job setup
  • Links In - Number only, see detail and export via Resource Details from within ContentWRX Audit
  • Links Out - Number only, see detail and export via Resource Details from within ContentWRX Audit
  • Images - Number only, see detail and export via Resource Details from within ContentWRX Audit
  • Videos - Number only, see detail and export via Resource Details from within ContentWRX Audit
  • Downloads - Number only, see detail and export via Resource Details from within ContentWRX Audit
  • Any custom columns and their values that were created in the dashboard

Download a PDF version of the User Guide.

Try or Buy Now

Sign up for a free trial of ContentWRX Audit (no credit card required for trial!) or buy one of our convenient subscription options.




Visit our library of articles about ContentWRX Audit, content inventories, and content audits.




On-demand webinars.

Web Content Strategy: Understanding Content Inventories & Audits

How to Perform a Content Audit and Extract Meaningful Insights



Take the Tour

Annotated screenshots illustrate ContentWRX Audit features.



Request a Demo

Interested in learning about how ContentWRX Audit can work for your organization?