Center for Citizen Media Rotating Header Image

Citizen Media Business Issues: Traffic Rankings, Search Engines, and Search Engine Optimization

(This is the seventeenth in a series of postings about citizen media business issues. See the introduction here. All of these entries are considered to be in “beta” and will be revised and refined as they find a home on a more permanent area of the Center for Citizen Media web site. To that end, your comments, additional examples, and criticisms are welcome and will be invaluable contributions to this process.)

In the previous Citizen Media Business Issues post, we took a look at Web statistics as a means to learn more about your site and the people who visit it. Now that you know how many visitors you have, what they look at, what sites are linking to you, and so on, the question becomes: how can one increase the performance in one or more of those areas?

This post is not about how to falsely inflate one metric or another. Nor is it about how to get traffic unscrupulously. It’s about how to use the tools available on the Web effectively, to accurately reflect the hard work you put in. There is no substitute for high-quality, well-presented content, but people need to be able to find it.

Traffic Rankings

There are several measuring sticks you can place next to your website to compare it with the rest of the Internet. A lot of people choose not to worry about such rankings, and their importance does certainly depend on your own personal goals. However, on this business side of things, which this series concerns, they matter.

First of all, they can provide good tools for goal-setting and motivation, and can sometimes act as a reward for a job well done, thereby facilitating the continued production of good content.

Second, marketers will often use them to gauge how much your advertising real estate is worth. You can have thousands of hits a day, but if, for whatever reason, you’re not popping up in a Google search, the rate you get for advertising won’t reflect its true worth.

Finally, and perhaps most importantly, knowing the inner workings of the various rankings will allow you to utilize them to their full potential, thus attracting more traffic.

Social News Sites

Before we take a closer look at individual traffic rankings, it’s worth mentioning how much of an impact social news sites can have on them. Webmasters will often notice a seemingly random spike in traffic over the course of a couple days that ends up being due to incoming links from a site like Digg, Reddit, NewsVine, Fark, Slashdot, and StumbleUpon. And a sharp increase in traffic will, of course, result in improved traffic rankings.

While each of these sites functions differently, they are all based on a community of users who share stories they find on the Web. The submitted links are then augmented with conversational tools and/or some form of voting system, allowing the most popular articles to rise in visibility and accessibility. A lot of people use these sites as news filters, relying on the wisdom of crowds to save browsing time by going first to those sites recommended by the rest of the community.

The way most people promote their content on these sites is by placing one or more of the site-specific icons near each of their headlines or blog posts. The best known example of this is probably the “Digg This” button, which registers a positive vote (known as a Digg) for the article when clicked and displays how many Diggs it has received to-date. It’s tempting to use a lot of these to enable readers to use their social news site-of-choice, but make sure to display them tastefully, avoiding messy icon spam. One way to do this is to use something like the Share This widget you’ll see at the bottom of this post. When clicked, it expands into a neat menu of possible services.

You may also consider getting involved in one or more of these communities. Signing up with one just because it’s the biggest or prettiest isn’t always the best idea. Take a survey first of what’s available (there’s a good list over at Dosh Dosh) to find where content like yours is featured prominently. For example, if you run a technology blog a la Engadget, you would probably fit in better at Slashdot than Newsvine.

As a new member, you may find that the things you submit don’t seem to get as much attention as something similar submitted by another user. This is because, as a community, there are often reputational gauges of member contribution (“karma” on Reddit, for example) as well as personal relationships that develop among the most frequent users. As time goes on and you get more and more involved, you may find a greater number of your submissions doing well for these reasons.

Alexa

Alexa Internet’s rankings are very simple. Every time a page on your site is viewed by someone who has their browser equipped with the Alexa toolbar, a hit is registered. It assigns rankings based on those hits (pageviews) and on “reach” (the percentage of all Internet users who have visited the site in question), averaged over the previous three months.

The website says that there is additional “data obtained from other, diverse traffic data sources,” but consensus seems to be that the toolbar is far and away the primary factor. And although there is some amount of data correction for potential biases, this has led to some controversy over how well the self-selected group of people who use the toolbar represents the entirety of Web users. The company’s disclaimers even include a note about data being less accurate for low-traffic sites: “the size of the Web and concentration of users on the most popular sites make it difficult to accurately determine the ranking of sites with fewer than 1,000 monthly visitors. Generally, traffic rankings of 100,000 and above should be regarded as not reliable.”

Despite the controversy, Alexa rankings are important to many advertisers and are at least accurate enough to be meaningful.

In spite of the myriad “## ways to boost your Alexa rank” blog posts around the Web, there is no easy way to improve your standings. Alexa Product Manager Geoffrey Mack dispelled most of the rumors a while ago between a post on the Alexa Web Discovery Machine (an official company blog) and a comment on the Online Money Making blog. All you can really do to increase your performance here—other than attract more traffic via other means—is to encourage your loyal readers to install the toolbar, perhaps displaying a widget as a reminder.

If you’re not listed on Alexa, you can either simply visit your site with the Alexa toolbar installed or submit it here.

A note about privacy: Before you decide to install the Alexa toolbar (or any kind of toolbar like it, for that matter), understand that there is more information gathered about you than simply counting your visit to a site as a hit. Make sure to read the Alexa Internet Privacy Policy, which makes clear that while it will not try to determine your identity, it does keep track of purchases, browsing history, searches, forms you fill out, and so on.

Search Engines

If you’ve looked at the data for how visitors get to your site, you’ve probably noticed that a lot of people find you through search engines. The major engines, led by Google, drive a massive amount of Web traffic these days. Most people have a few favorite sites that they visit regularly by typing the familiar URL into the address bar or clicking a bookmark, but odds are good that search engines are your gateway to most of the rest of the Web.

Search engines work by crawling, indexing, and then using that index to serve Web data in response to users’ search queries.

A “crawler” is a program that periodically visits many millions of pages, gathering data about their text, text location, headings, and so on. It essentially tries to figure out what each page is all about. That data is then “indexed,” or stored in a searchable database. So the first thing to note here is that the search engines have to know about your site in order to crawl it. To inform them of your existence, follow these links: submit to Google, submit to Yahoo!, submit to MSN Live Search, and if you have a blog, submit to Technorati.

When you search for something, the engine will look at its huge index and return a list of results in order of relevance. Through your own common practice you probably know the importance of this search result ordering. In what percentage of searches do you actually get past the first few results? For most people, it’s rare, and even rarer to see more than the first couple pages.

None of the big engines make public exactly how they organize and prioritize search results. Google, for example, uses over 200 factors to determine which sites in its index are most relevant to the search query, and according to the New York Times, “makes about a half-dozen major and minor changes a week to the vast nest of mathematical formulas that power the search engine.” Publicizing all of those factors would make the system too easy to manipulate and duplicate.

Not all of these signals are unknown, though. Some are disclosed by the company, many are common sense to the Web savvy, and a few have been determined via trial and error and data studies. Examples include the weighting of titles and headings over other content, location of search terms on the page, how close the words in the search string are to each other, and so on.

Website Importance

If you perform a Google search for “muffins” you’re going to get millions of hits. To some extent, the results will be based on relevance. But how does Google know if you are looking for recipes, a definition, the name of a restaurant, or a YouTube video with that name? Search engines then have to apply a sort of “importance” factor. The idea is that, with equal relevance, a very popular site will probably satisfy a user’s request more often than an unknown.

These importance rankings tend to be what advertisers care about most. The main reason for this is that performing searches to see where in the results pages a particular site appears for a variety of applicable keywords just isn’t practical. Rankings are quantifiable, more static, and simple values that have a significant enough relationship with traffic and search result priority to merit a direct relationship with advertising dollars.

The simplest algorithm is Technorati’s Authority system. For every blog that has linked to you (and also pings Technorati) in the last six months, your authority increases by one. It doesn’t matter if they link to you once or a hundred times, one blog equals one authority point. When searching with Technorati, users can choose to filter search results based on authority (for example, only including those with high scores). Authority is also displayed in the form of Technorati Rank, which is just a list of blogs arranged by authority (the number 1 ranked blog has the most authority).

PageRank is Google’s way of quantifying the importance of a site within the link structure of the Web. Generally speaking, the more a page is linked to, the higher its “importance,” and thus the higher its position in the search results. On top of that, incoming links from sites that are themselves considered important carry more weight than links from less important sites. A link to you from nytimes.com, for example, will count more than a link from somewhere on Geocities.

The raw PageRank value is equal to the likelihood that someone randomly clicking on links anywhere on the Web will end up on the page (with a 15% chance of a truly random link). But you won’t see this raw value published anywhere; instead you can find a score between 0 and 10 for any site you visit with the Google Toolbar. The higher the number, the more important the site.

A common misconception is that PageRank determines search result order. Actually, it’s just a weight (albeit a seemingly heavy one), acting as just one of the aforementioned 200+ signals.

If you don’t want to install the Google Toolbar, a number of sites have programs that will find a site’s PageRank. One is SEOmoz’s rank checker, part of its SEO Toolbox.

[Note: Yahoo!, Ask, and MSN Live Search all function similarly to Google, with similar processes to PageRank, but don’t seem to come into play as much for marketers. Keep in mind that while this post may reference Google far more often than the others, most anything you do to increase your Google performance will likely yield better results in its competition, too.]

Search Engine Optimization (SEO)

SEO is the process of trying to improve the amount or quality of search engine traffic. It’s become a big part of Internet marketing as businesses have found that a place at the top of organic search results is far more effective than most any advertising they could buy—and unless you pay an individual or SEO firm to do it for you, it’s free. “Organic search results” means those that aren’t advertisements or otherwise sponsored results.

Some of the main areas of SEO concentration are keywords, code, and design/presentation. After a little explanation of keywords, we’ll give some examples of the other two.

Keywords

Let’s say you run a blog called The Bread Zone that’s all about sandwiches: where to get the best peanut butter and jelly, personal recipes, history, and so on. Now put yourself in the position of a potential reader/customer. What would you search for to find the information on your site? Certainly not “The Bread Zone.” It’s nice if you’re at the top of a search for your blog’s name, but if somebody knows the name of your site and is searching for it, it doesn’t matter all that much where in the listings you are. The idea here is to associate your site with certain words or phrases so that if someone does a search for, say, “best sandwiches Boston” or “Elvis’s sandwich”, you’ll be near the top.

The more your keywords are used, and the more prominent they are (i.e. in titles and headings), the more likely a search engine will be to associate them with your site.

Keyword associations evolve naturally, based on the sorts of things you write about. If you write a lot about the best sandwiches in Boston, you won’t be able to help Google automatically considering you more relevant for searches containing related terms.

The first thing you can do to let search engines know what sorts of words will be relevant to your site is to put them in your meta keywords tag. Unfortunately, while meta keywords used to be vital, their importance been severely weakened in response to rampant abuse by people using misleading information or entering a huge number of keywords (a tactic called “stuffing”).

“Keyword optimization” involves careful use and placement of the chosen keywords, aggressively pushing the associations. If you decide to attempt this, keep your writing organic! Be very careful not to get involved with practices that improve search results at the expense of content quality. It will backfire. Shoehorning “Elvis’s sandwich” into a bunch of titles, headings, text, links, templates, and captions without duplicating the rest of the content may be in line with what the search engines look for, so you might see a temporary boost in traffic, but then it will collapse. First your audience will fade because nobody will want to look at or read such poorly-composed text. Then because all the search engines have some sort of measure to catch this manipulative practice (another form of “keyword stuffing”), they will devalue your URL if not remove it from the index completely.

Even if you’ll be stopping at setting your meta tags, spend a little while really thinking about what words and phrases you anticipate using often anyway. Go as specific as possible, as long as it’s something you can see yourself writing a lot. Remember that crawlers don’t actually read English, so using the keyword “astrophysics,” even if it’s an accurate representation of the site’s content, is meaningless if the word itself doesn’t appear in the text. Also, “sandwich, bread, blog” isn’t going to help you a whole lot either. Even if they pop up often in your articles, those are such generic terms that you’ll be lucky to appear on page 10.

For an easy way to track how often you use potential keywords use the Webconfs Keyword Density Checker. It will display a word cloud, count how many times you use each, and calculate keyword density (keyword occurrence/total words, not including stop words like “the” or “and”). The ideal density for each keyword is somewhere between 1% and 6%.

Once you have some keyword ideas, you can throw them into Google’s Keyword Tool to analyze how often people search using those keywords, as well as some more specific alternatives.

SEOChat has one of the better articles out there about Choosing and Researching Keywords. It’s several years old, but still highly relevant.

Search-friendly coding issues

Title tags

Look at the top of your browser window. Right now it probably says either “Center for Citizen Media: Blog” or “Center for Citizen Media: Blog » Blog Archive » Citizen Media Business Issues: Traffic Rankings, Search Engines, and Optimization:” (perhaps abbreviated), followed by the name of your browser. This is the title, and every single page on the web has one. Titles, because they are by nature relevant to the page’s content, are a big deal to search engines. Try to use a different, appropriate title for each page you create. Many blog hosts will do this automatically. A good format is to use the name of your site for the main page, and then on other pages use something like “[name of your site] – [name of specific page or post]”.

Alt attributes and picture names

Search engines don’t see pictures. Not even Google Image Search sees them in the way humans do. It looks at the surrounding text and page content, the name of the picture file, and the “alt” attribute (see below for an explanation of this), but it has no way to see the graphic and understand its contents. Many blog and web hosts’ uploading programs will rename the files you upload to some random combination of letters and numbers, so you may not have total control over that, but it’s easy to use alt attributes effectively. “Alt” is what would be displayed in place of the image if you were using a text-only browser. It is also what users of screen readers would hear (software for the blind that speaks aloud the words on a page). In some browsers, when the title attribute is missing (not to be confused with the title of the page), it is also what pops up when you hover your cursor over an image.

Some publishing software gives you an “alt attribute” field to fill in when you create a link; if not, it’s just a matter of looking at the link’s code. Find the appropriate line with an “IMG SRC” html tag. You’ll see the IMG SRC is equal to the URL of an image file. All you have to do to is add alt = “the alt text goes here” after the URL and before the close bracket. When you’re done it’ll look like:

<IMG SRC=”http://www.yoursite.com/images/sample.jpg” alt=”your alt text” >

The alt text should act as a substitute for the image’s existence, which doesn’t always mean a description of the picture’s contents. A mailbox icon, for example, is probably better described as “e-mail [your name or site]” than “blue mailbox with a letter sticking out and the red flag raised.”

Also, don’t use alt text that is redundant to adjacent article text. And it’s probably best not to use alt tags for graphics that are purely for decoration (frilly borders and such), as they don’t actually provide anything relevant to the primary content.

Sitemaps

From the Sitemaps website:

Sitemaps are an easy way for webmasters to inform search engines about pages on their sites that are available for crawling. In its simplest form, a Sitemap is an XML file that lists URLs for a site along with additional metadata about each URL (when it was last updated, how often it usually changes, and how important it is, relative to other URLs in the site) so that search engines can more intelligently crawl the site.

The easiest way to create a sitemap is to use XML-Sitemaps.com. Just enter your URL, wait for it to scan, download the file, and place it in the root folder of your website (so that its address is www.yoursite.com/sitemap.xml). If you use a blog that doesn’t let you upload files like this, the next best option is to use your RSS feed as a sitemap (usually www.yoursite.com/rss.xml).

Edit your robots.txt (a file on your website that contains instructions for crawlers and other robots) to include the line

Sitemap: [sitemap_location]

where [sitemap_location] is the complete URL of your sitemap. This is the only way certain search engines, like MSN, will find it. For more information about robots.txt, see the Web Robots Pages.

To add your sitemap to Google, just sign up for Google Webmaster Tools and click “add” under the Sitemap column.

Adding your sitemap to Ask is particularly important, since the Ask crawler isn’t quite as active as the others and there is no way to simply submit a URL to be crawled. To do so, just copy and paste this link, replacing the end with the location of your sitemap: http://submissions.ask.com/ping?sitemap=http%3A//www.the URL of your sitemap here.xml

301 redirects

If your site is located at www.yourdomain.com, a redirect is required to make sure people can get to you when they type yourdomain.com (omitting the www) into their address bar. Your domain registrar has probably already set this up for you, but check with them to make sure this is a “301 redirect.” This tells search engines that the redirect is permanent, not temporary.

A fast way to find out which type of redirect you have is to use the SEOmoz HTTP status code checker.

Search-friendly design and presentation issues

Frames

Search engines are not smart enough to be able to see HTML frames as part of a single page. So when analyzed separately, each is weaker. The main content frame has fewer links and the navigation frame (assuming a cliché frames layout) will more than likely be discounted altogether for consisting only of links.

Frames made with CSS are better, because you can do it without creating separate files.

Java and Flash

Search engines don’t see text embedded in JavaScript or Flash. If you make the design choice to use these to display content, know that it will not be searchable.

Duplicate content

Search engines have duplicate content filters so as not to serve the end user a bunch of identical results, and also to prevent someone from copying and pasting good text from a popular site like Wikipedia onto a dummy page where it is passed off as original content in order to bring in ad money. Therefore don’t repost articles, avoid overusing boilerplates, etc.

Persistent navigation

Avoiding duplicate content doesn’t apply to navigation. In fact, having a good navigation system is very helpful as it creates a dense web of internal links, which makes it easier for the crawler to find all of your files. A crawler might not find something that requires several clicks to access.

Linking issues

Search engines want to serve the highest quality, most relevant results to users. Since links are such a huge factor when determining search results, there are many ways in which yours can help or hurt you.

Fix broken links. If you use DreamWeaver or FrontPage, you can check for them from inside the program. If not, you can use a tool like Webmaster Toolkit’s Link Checker or iWebTool Broken Link Checker to scan for them.

Don’t link to bad neighborhoods (spammers, abusers of search engines, sites that install malware…use common sense). Inbound links from such sites won’t hurt you, though, since you are assumed not to have control over it and they won’t affect a user’s experience on your page.

Watch the number of outbound links on a given page and try to stay under 100. More than that and you may trigger a link spam filter (more on link spam in the Warnings section below).

Consistent and relevant anchor text

When you create a link on your page, you typically don’t display the full URL in your article. A link to the Wikipedia article on frogs, for example, will usually look more like <a href=”http://en.wikipedia.org/wiki/Frog”frog than http://en.wikipedia.org/wiki/Frog. These words that you show the reader to describe the link, which at the same time hide the unattractive URL, are called anchor text.

Anchor text is very important to search engines because it’s often very relevant to the page it refers to. The more often Wikipedia’s frog page is referred to as “frog,” the more search engines associate that particular page with that word. Knowing that, what do you think happens when someone links to you using nondescript text like “this post,” “here,” “a section,” or “great article”? Unfortunately, it doesn’t mean that when somebody Googles “great article,” your page will come up. These terms are much too widely used, and aren’t informative so the anchor text ends up meaning nothing.

So use relevant anchor text whenever you can, especially for your own content. Be consistent, too. Use exact titles or the same keywords when linking to your other articles. However, going out of your way to shoehorn detailed anchor text for every outbound link isn’t always practical. It can cause some awkward sentences and take up more space than you want, but rewording whenever you can is good practice.

Google Webmaster Tools will allow you to view the most common words in anchor text linking to you.

Directories

Submit your site to the Open Directory Project and the Yahoo! Directory. These are not search engines and optimization isn’t a factor here, but many people use these human-organized and edited databases. Also, the data is used to generate search engine results, especially the Yahoo! Directory, which is drawn from for every Yahoo! search. They don’t accept everybody, but if your content is unique or very good, and if you pick the right category to add it to, you have a good chance.

Warnings

Spamdexing means to manipulate search results in a way that’s not in line with the way the search engine is supposed to work. At the beginning of this post there was a disclaimer that the information contained herein would not be concerned with unscrupulous forms of search engine optimization. Here we’ll detail a few of the most common such tricks. These are not just dishonest; they’re unsound and will backfire.

Google, et al, know about these scams and have been tweaking defenses against them for years. Most of these are now detectable automatically via the crawler, in which case the offending site will be significantly devalued. In more drastic instances, the site will be removed permanently from the index. The few of these that do work still violate the search engines’ terms of service and, if reported or manually caught, will result in similar disciplinary action.

In 2006, head of Google’s webspam group Matt Cutts confirmed on his blog, for the first time, a company’s removal from Google’s index: “Google has removed traffic-power.com and domains promoted by Traffic Power from our index because of search engine optimization techniques that violated our webmaster guidelines at http://www.google.com/webmasters/guidelines.html.” Companies had been removed before, but this was the first noted by name. The impetus for the disclosure was to support the defendants in a defamation lawsuit filed by Traffic Power against its critics. Banning not just Traffic Power, but also its clients makes it clear that Google has no qualms about squashing sites that use shady SEO practices, intentional or not. According to Google, you are held responsible for SEO violations relating to your site, whether implemented by you or by an SEO professional you’ve hired.

So these warnings are not only for those with flexible ethics. There are many well-meaning webmasters that can accidentally fall into these traps, unaware that they are doing something wrong or raising a red flag.

Note: If you have been wrongly removed from Google, there is a reinclusion process you can go through. However, it can take some time to get your page restored to the search results.

Link spam

As we know, links matter when determining a site’s importance and relevance. Link spamming is creating links without merit.

One form this takes is that of a link farm, a community of web pages that all link to each other in order to boost search engine presence.

Similar is a link train, which acts like a sort of web-based pyramid scheme in which you place your link at the bottom of a list, publish it, and get other people to do the same.

Some people will create multiple websites themselves just to link back and forth and boost the primary site.

Others will place links in the comments of dozens or hundreds of blogs, often irrelevant or without reading the post itself.

A little less common, but still disallowed, is the practice of paying for links at a popular site to be displayed in such a way that they don’t look like advertisements.

Keyword stuffing

Mentioned briefly before, this is when you place a very large number of keywords:

  1. in the meta keywords tag
  2. together at the bottom of a page
  3. in the article text
  4. pretty much anywhere else
  5. Only use a few in the meta tag, and for everything else allow only whatever comes from natural writing.

    Invisible text, single pixel links, 0-width DIVs, and tiny fonts

    These are all tactics people use to hide link spam or keyword stuffing, attempting to get the SEO benefit without looking sloppy.

    Some SEO-related links

    1. HubSpot’s Website Grader is one of the most complete free SEO and marketing statistic tools on the Web. Just type in a URL, and after a few moments it’ll spit out a detailed report covering everything from PageRank and Technorati rank to recommendations for how to improve your use of metadata. If you’re looking for a simpler tool that will display the number of backlinks and pages indexed by the big three engines, use the Webconfs Domain Stats tool.
    2. If you want to learn more about the variables search engines look for, SEOmoz put out a really valuable article in which they compiled the responses from 37 SEO professionals about the importance of many factors.
    3. SEOTools has a couple useful Firefox add-ons called SEO for Firefox and Rank Checker. SEO for Firefox, when enabled, supplements a normal Google search with a heap of data underneath each hit. The idea is that it allows you to figure out why a page is ranked where it is. There’s nothing here that you don’t get from HubSpot’s Website Grader, but it can be useful to have the information displayed right under search results. Rank Checker can automate periodic keyword analysis. For example, find out where a particular page ranks on any of the three major engines for one or a number of keywords. Then store the information so you can see how it has changed over time.
    4. If all this is a bit much, but your website is important enough to you that you would like it to be optimized, Google put together a guide to hiring a professional SEO.
    5. Official information for developers about each site:
      1. Alexa Developer’s Corner
      2. Ask Help for Webmasters
      3. Google Webmaster Guidelines
      4. Live Search Technical Documentation
      5. Yahoo! Help for Webpublishers

    (Ryan McGrady is a new media graduate student at Emerson College where he is studying knowledge, identity, and ideas in the information age.)

Citizen Media Business Issues: Web Statistics

(This is the sixteenth in a series of postings about citizen media business issues. See the introduction here. All of these entries are considered to be in “beta” and will be revised and refined as they find a home on a more permanent area of the Center for Citizen Media web site.   To that end, your comments, additional examples, and criticisms are welcome and will be invaluable contributions to this process.)

How many people are reading what you’re writing? Who are they? How did they find you? If applicable, how likely are they to click an ad or buy a t-shirt? And without negatively affecting the users’ experience, how can you attract more visitors or increase the probability that they’ll click the ad or buy the t-shirt?

For the final three Citizen Media Business Issues posts, we’ll try to answer these questions by exploring Web statistics, traffic rankings, search engines, and optimization (for search engines as well as for other goals).

Web Statistics

Often referred to as analytics, Web statistics are various measures of website activity intended to help the webmaster or marketer. Webmasters use the information to attract more visitors and improve overall user experience. Marketers use it to maximize revenue and determine the value of ad space.

While companies have been providing and/or selling analytics software and services since the mid-90s, the average webmaster without a budget for such things had to, for a long time, settle for the once-ubiquitous odometer-styled hit counter. The simple, public measure of how many times files were accessed since the counter’s creation, aside from being a little tacky, was too unreliable and subject to manipulation by those seeking to misrepresent their traffic data. Counters just didn’t do a good enough job of explaining what really happens on a site. Even if you found a free piece of real analytics software, you needed a good amount of technical savvy and access to you site’s server to get it going and maintain it.

Google Analytics has brought statistics to the Web mainstream, providing a free service that requires no downloads and very little effort. All you have to do is copy and paste a little snippet of code into each of your pages (or just the template, if you’re using one). Google Analytics’s metrics and features with which you can analyze them will likely be more than what you need (like which ISP your visitors are using most), but let’s take a look at how to use it so you can figure out which stats are most important to you.

[Note: The majority of this article deals with Google’s product because it is well-known, comprehensive, easy, and free, but it’s far from the only game in town. Drawbacks and alternatives will be discussed at the end, but almost all of the features and metrics discussed in the context of Google Analytics are applicable to its competitors, as well.]

Using Google Analytics

The default tab, the Dashboard, contains the information you’ll most regularly want to check. It’s customizable, so you can click the little X in the corner of each box to remove it and the sections in other tabs all have an “Add to Dashboard” button at the top. The first two things to note, because they apply to every Analytics page, are the date range and the graph. The date range (top right corner) is, of course, the period of time that the statistics reflect, and you can cover as much or as little time as you want by playing with it. The graph displays a measure of a particular spec over the selected date range (number of visits, by default, on the Dashboard). The drop down in the top-right corner of the graph allows you to change what it is that you’re measuring.


Google Analytics Dashboard


The rest of the data can be roughly explained in four groups: how many visitors you have, who they are, what they do on your site, and where they came from.

How many, which includes stats like the numbers of visits and pageviews, is, of course, the most definitive measure of how popular your site is and often the primary motivator to use this software to begin with. A visit is logged any time someone starts a new session on your site, with pageviews counting all pages loaded within that session. To Google, a “session” times out after 30 minutes of inactivity. If you visit, leave, and come back within 30 minutes, it will likely be counted as a single visit. Likewise, if you idle for 30 minutes and then click a link to another page on the same site, you’ll probably be starting a new visit/session on the new page.

Many webmasters want to know who their visitors are in order to improve user experience. Google tracks geographic data, browser types, operating systems, ISPs, connection speeds, percentage of users with Flash or Java installed, and the number that have been to your site before. If, for example, you find that half of the people looking at your page use Internet Explorer on dial-up connections, you probably wouldn’t want to require use of a FireFox plug-in or use a lot of large files like videos or high-resolution images.


Google Analytics City Detail


Information relating to what people do on your site includes amount of time spent on site, average pageviews, and bounce rate (a term that refers to the percentage of visitors who left the page they arrived on without checking out any others).

Unfortunately, due to tabbed browsing, idle time, session timeouts, and wildly varying personal browsing habits, it’s hard to get a lot of meaning from the average logged amount of time on site/length of visit, but we’ll talk about some of its possible uses in comparisons later. Also, average pageviews, bounce rate, and depth of visit are really only useful to you if you’re not running a blog or news site. Blogs are almost invariably set up to display many articles on one page and the news sites that don’t use blogs still make use of scannable headlines, placing a good deal of content on the front page. What reason would someone have, then, to explore other pages?

A useful what people do stat is under the Content tab, where you can see a break-down of how popular each of your pages is in terms of pageviews. With this you can assess where your strengths and weaknesses are, perhaps spot a problem if something should be higher or lower (a mistyped link, for example), or figure out what sort of content the search engines most closely associate with your site (more on this in the next post).

If your goal is to get more traffic, then the most valuable data for you here probably has to do with where visitors come from, which can be found under the Traffic Sources tab. “Where” in this case doesn’t refer to geographical location, but how people find you on the Internet. Most people probably don’t type your URL into their address bar, but those that do are counted as “direct traffic,” as are those who have it bookmarked in their browser and access your pages that way. High direct traffic is usually the result of offline marketing (business cards, print ads, etc.), an extremely accessible/memorable URL, or high reader loyalty.

Traffic that isn’t “direct,” then, must have come from some other point on the Web. The Referring Sites section displays not only the names of pages that link to you and how many of your hits came from there, but also the trends of each group of referred users. So you can see how the average pageviews, time on site, or bounce rate differs between users who clicked in from stumbleupon.com and those who found you through your friend’s blogroll. The only search engine you will probably see on the list is Google Images. Search engines have their own section (not sure why Google Images isn’t included there).

By taking a close look at your referring sites, you can tell why people are linking to you (they’re probably linking to particular pages or topics), how relevant the referral was (a visitor from Site A may spend twice as long and look at twice as many pages as Site B), learn your strengths and weaknesses, and gain feedback about what you’re writing (though rarely criticism—that usually just shows up as a lack of hits unless you’re sufficiently polemic).

The last major part of where users come from is your search engine data, including keywords. The next post will cover all things search engine in detail.

FeedBurner for RSS

Google Analytics is a fantastic tool that can really help you improve your site (or at least provide you with some fun trivia about your readers), but the major thing it’s missing is data about RSS subscriptions.

Many if not most regularly-updated sites have RSS feeds these days. If you run a blog via any popular weblog software, in fact, you definitely have one. Unless you’re hiding it, odds are that a significant percentage of your visitors get your content that way. So if you have an RSS feed and want the same kinds of information about it that you now have for regular Web traffic, head over to FeedBurner.

FeedBurner was recently acquired by Google, so may be integrated into Analytics soon, but for now it’s the best place to get statistics and add features to your feed. The service works by replacing your current RSS or Atom feed, directing visitors to subscribe to your FeedBurner feed instead. The end user’s experience doesn’t change unless FB’s compatibility tweaks makes the content more readable or you add features.

The two main RSS stats of note are subscribers and reach. Subscribers are a measure of how many people used their RSS reader to check in to see if you had new content. Reach is the number of people who actually see content either through an RSS reader or otherwise—like on a news aggregating website or an RSS search engine.

Drawbacks to Google Analytics and Privacy Concerns

The major feature-based drawback to Google Analytics has to do with the availability of the data. Google decides when reports are generated, not you, so information you see is usually from at least a few hours ago. But features aside, perhaps the biggest concerns people have about using this software have to do with privacy.

While use of Google Analytics is monetarily free (unless you get more than 5 million pageviews per month), you are paying them in the form of information. All of the data you collect about your site, including the information visitors “give” to you, is collected by Google. Per the Google Privacy Policy and Analytics Terms of Use, the company can/will collect information you provide in user sign-up forms, search histories, emails, information about your browser and computer via cookies (which includes at least that which you can see about your own site’s visitors), what sites you’ve visited (lots of pages use Google AdSense and/or Analytics), and so on. And though they assure us it will not be shared with any third-parties (it’s mainly for making Google AdWords/AdSense advertising more relevant), Google has a massive amount of data about the world’s Internet users, and that makes some people uneasy.

Alternative Statistics Programs

If it’s possible for you to do so, the best options will generally be programs that you host on your own Web server. They’re the most reliable, most customizable, give you total control over your data (and ownership thereof), and won’t limit how many pageviews you can analyze like most of the hosted services. The downside to these is that they require access to your server and the technical knowhow to install, configure, and access the software yourself. If you feel comfortable going down this route, Piwik’s website should be one of your first stops. It’s a very good, free, open-source program with a large base of developers behind it. It positions itself as the “open source alternative to Google Analytics” and it’s about as user friendly as this sort of thing can get. Other free server-based options include AWStats, SlimStat, and Webalizer, but while each of these has its own unique benefits, they are decidedly more difficult to use than Piwik.

Hosted (where a company has the software on their server so you don’t have to worry about installing it on yours) alternatives to Google are generally pay services or limited in the number of pageviews per day/month you can have analyzed. W3Counter, for example, has a good free service, but it’s limited to 5,000 pageviews/day and you’re required to display a small logo of theirs on each of your pages. The upgraded plan, which is currently $9.95/mo, removes the logo obligation and allows up to a million pageviews/month (among other features). As another option, StatCounter’s free plan offers almost all of the features of the premium plans, which range from $9-$29/month, except for the amount of analysis it will perform, which is broken down into two levels. At the no-cost level, basic statistics are viewable for 250,000 pageviews/month, but detailed information is limited to the last 500 visitors. Both W3Counter and StatCounter offer real-time reporting.

Almost all of these services have a demonstration page on its website for you to test drive the program before you sign-up or install it, so check out a few before deciding.

(Ryan McGrady is a new media graduate student at Emerson College where he is studying knowledge, identity, and ideas in the information age.)

Investigative Blogger Raising Funds

Firedoglake is raising money to pay for investigative blogging.

Berkman Center Talk Next Tuesday

I’ll be speaking next Tuesday at lunchtime at the Berkman Center. Topic (and link for RSVP):

Mediactive: Why media consumers, not just creators, need to be active users.

Journalism and Location

I’ll be speaking at the Where 2.0 conference next month in San Jose, about journalists are using, and can use, location-related products and services. The talk is called Where Does Journalism Go?

You can get a 25 percent discount by using this code — whr09rdr — when registering.

Summer New Media Program at Arizona State

The Cronkite School at Arizona State is offering a summer New Media Academy “for adults who want to understand how communication is changing and how to set up and maintain a fully functional, multimedia-rich Web site.”

ProPublica Invites the Public's Help

ProPublica has launched the citizen-journalism portion of its operation, or at least the first iteration. By posting The Obama Team’s Disclosure Documents and asking readers to help figure out any potential conflicts of interest or other facts that are worth knowing, the site is doing what newspapers could have been doing years ago but haven’t bothered to do. This crowdsourcing follows key early journalistic adopters, notably Josh Marshall and his team at Talking Points Memo.

Amanda Michel is leading ProPublica’s citizen component. This is a great start.

Location, Location

Combining mobility, time and location is becoming one of the most valuable techniques of media creation. Last week, some students and I did a small experiment that demonstrates how easy this is to do, and suggests all kinds of possibilities for journalistic follow-ups.

Phoenix First Friday Art Walk

This Flickr map has more than 120 photos, taken by me and some Arizona State University journalism students, at last week’s Phoenix “First Friday Art Walk” — a monthly, self-guided tour of a downtown-Phoenix district that contains a number of galleries and craft-oriented shops.

Putting this together was absurdly simple: We combined the capabilities of the Google/T-Mobile G1 smart-phones and services provided by the photo-sharing site Flickr. (Note: Google provided us with the phones and its carrier partner, T-Mobile, gave us airtime.)

The G1s are the first in a line of what Google hopes will be lots of devices using the Android operating system, which is considerably more open than Apple’s iPhone and has, in my view, roughly equal potential. The G1s contain, among many other capabilities, digital cameras and GPS (global satellite positioning radios that tell location within a few meters).

Each of us shot a dozen or so pictures at various places along the Art Walk streets. After snapping each picture, we sent it by email to a special address at Flickr, using the name of the gallery or other location as the subject line and adding some body text to describe what we were looking at.

Embedded in the JPEG photo files created by the G1s is a critically valuable bunch of zeroes and ones: the location as determined by the GPS. Flickr reads that location data as it imports the picture files, and then places the images autormatically on a map.

In other words, the map was being created in real time, as we walked the streets and snapped the photos.

Now, this is not a new idea by any means. And we could have done a much better display of the pictures with a bit more time; Flickr’s mapping display to the general public is very crude compared with what it could do (the image above, much better than the one you’ll see if you click this public link, is available to the account holder of the map, but not to other people) Moreover, sending pictures via email was a crude way to handle the images; there are applications for the iPhone and Nokia’s GPS-equipped phones that upload to Flickr much more efficiently than anything written so far for the G1.

Still, it was trivially simple to set this up and make it work, using tools that already exist and are, for the most part, easy to use. We’ll be doing much more with the G1s over time (including, I hope, creating applications that more fully explore the devices’ potential).

The point is that some events take place over time and space, and are made to order for this kind of treatment. Journalists are actually quite late to the party. Flickr and other sites are displaying crowd-sourced such events via user-created tags.

We’re planning to open up this page to others in the Phoenix community, so that over time people create a rich photo set of First Friday. We’ll help people sort by dates, not just location, so that we can see how the monthly event changes over time, too.

We are planning a series of other experiments with these phones (and others), and would be grateful for ideas on how we might take best advantage of these incredible devices. Our goal is simple: testing ideas that will help create valuable community information resources and services.

Pundit to Critic: Fuck You

UPDATED

It’s hardly surprising when someone fires back at a harsh critic of his or her employer’s competence and/or ethics. But when that someone is superstar New York Times columnist Thomas L. Friedman, and the return fire takes the form, in part, of “Fuck you,” it raises a few eyebrows — and makes you wonder about a broader hubris.

The exchange in question came yesterday at the Freedom to Connect conference, a gathering in suburban Washington where people discuss issues related to data networking and the information revolution. Friedman’s keynote talk was all about his latest book and touched on the conference theme only briefly during the Q&A.

He’d already dropped the F-bomb at the start of his talk (in a WTF mode) when he noticed the conference back-channel discussion scrolling by on a stage-monitor screen. Later, during the Q&A, he was asked to comment on a question posted there that challenged the Times’ credibility in a fairly general and nasty way.

He began, appropriately, by saying that yes, the paper makes mistakes. But then he offered what sounded like a more heart-felt response, the above-noted “fuck you,” winning applause from some but certainly not all or (by my estimate) even a majority of the audience.

Friedman had my sympathy in some ways. It’s hard to sit there and take abuse, even though pundits dish it out for a living to people who have thicker skins than all but a tiny minority of journalists. (I’ve fired back at some folks on my various blogs over the years, even ones written as part of newspaper gigs, but always remembered that there were lines I wouldn’t cross in that professional venue or, short of the most extreme provocation, in any situation.)

Yes, the question he’d been asked was shallow and accusatory — and yet absolutely reasonable in several key respects. The Times (I own stock in the company) is a great institution that does absolutely vital work. But it has had to answer, and not always persuasively, for its own grotesque lapses — not least, in recent history, the Jayson Blair and Judith Miller scandals — and Friedman himself has hardly been a pundit whose pronouncements are infallible or, on some issues, even mostly correct in retrospect. His self-involvement isn’t off the charts, meanwhile, but it’s plainly strong.

So while understandable, his arrogant retort reflected more than merely the self-assurance of a pundit who’s won multiple Pulitzer prizes, has penned best-selling books and gives speeches around the globe promoting his viewpoints. It was entirely illustrative of his newspaper’s famous confidence, which more often than it should bleeds into hubris and outright arrogance.

Saying “Fuck you” didn’t make him more authoritative. It diminished him.

UPDATE: Friedman sent the following (very slightly edited) to a Freedom-to-Connect mail list, and gave me permission to repost it here:

To those who understood where I was coming from, thanks. To those who didn’t, thanks also. We should all learn from our critics.

I believe passionately in the New York Times, a place I have worked at my whole adult life. Lord knows, it has made its mistakes. Which newspaper or blogger hasn’t? But I believe that when it is at its best it plays a vitally important role in our democracy, and flippant, denigrating remarks about it, at a time when it is in economic peril and our country desperately needs serious journalism to sort through this crisis, struck me as deeply unserious.

That said, when I’m trying to make a point, especially a heartfelt one, and my choice of words ends up getting in the way of that point — even if for just one person — then I chose the wrong words. So thanks to all for a great discussion and a learning afternoon.

Reporting on the Phantom Financial Economy

Accompanying Simon Johnson’s remarkable, must-read Atlantic article, “The Quiet Coup,” is this chart:johnson-chart.gif

What’s not noted here — or in most traditional media coverage of the meltdown — is something that everyone should understand. These profits were an illusion in the end. They existed just long enough, purely on paper but not in any long-term reality, to boost stock prices in these companies and to give executives the excuse to pay themselves extravagant salaries and bonuses and, if they were really smart, to unload their stock at high prices.