(This is the seventeenth in a series of postings about citizen media business issues. See the introduction here. All of these entries are considered to be in “beta” and will be revised and refined as they find a home on a more permanent area of the Center for Citizen Media web site. To that end, your comments, additional examples, and criticisms are welcome and will be invaluable contributions to this process.)
In the previous Citizen Media Business Issues post, we took a look at Web statistics as a means to learn more about your site and the people who visit it. Now that you know how many visitors you have, what they look at, what sites are linking to you, and so on, the question becomes: how can one increase the performance in one or more of those areas?
This post is not about how to falsely inflate one metric or another. Nor is it about how to get traffic unscrupulously. It’s about how to use the tools available on the Web effectively, to accurately reflect the hard work you put in. There is no substitute for high-quality, well-presented content, but people need to be able to find it.
There are several measuring sticks you can place next to your website to compare it with the rest of the Internet. A lot of people choose not to worry about such rankings, and their importance does certainly depend on your own personal goals. However, on this business side of things, which this series concerns, they matter.
First of all, they can provide good tools for goal-setting and motivation, and can sometimes act as a reward for a job well done, thereby facilitating the continued production of good content.
Second, marketers will often use them to gauge how much your advertising real estate is worth. You can have thousands of hits a day, but if, for whatever reason, you’re not popping up in a Google search, the rate you get for advertising won’t reflect its true worth.
Finally, and perhaps most importantly, knowing the inner workings of the various rankings will allow you to utilize them to their full potential, thus attracting more traffic.
Social News Sites
Before we take a closer look at individual traffic rankings, it’s worth mentioning how much of an impact social news sites can have on them. Webmasters will often notice a seemingly random spike in traffic over the course of a couple days that ends up being due to incoming links from a site like Digg, Reddit, NewsVine, Fark, Slashdot, and StumbleUpon. And a sharp increase in traffic will, of course, result in improved traffic rankings.
While each of these sites functions differently, they are all based on a community of users who share stories they find on the Web. The submitted links are then augmented with conversational tools and/or some form of voting system, allowing the most popular articles to rise in visibility and accessibility. A lot of people use these sites as news filters, relying on the wisdom of crowds to save browsing time by going first to those sites recommended by the rest of the community.
The way most people promote their content on these sites is by placing one or more of the site-specific icons near each of their headlines or blog posts. The best known example of this is probably the “Digg This” button, which registers a positive vote (known as a Digg) for the article when clicked and displays how many Diggs it has received to-date. It’s tempting to use a lot of these to enable readers to use their social news site-of-choice, but make sure to display them tastefully, avoiding messy icon spam. One way to do this is to use something like the Share This widget you’ll see at the bottom of this post. When clicked, it expands into a neat menu of possible services.
You may also consider getting involved in one or more of these communities. Signing up with one just because it’s the biggest or prettiest isn’t always the best idea. Take a survey first of what’s available (there’s a good list over at Dosh Dosh) to find where content like yours is featured prominently. For example, if you run a technology blog a la Engadget, you would probably fit in better at Slashdot than Newsvine.
As a new member, you may find that the things you submit don’t seem to get as much attention as something similar submitted by another user. This is because, as a community, there are often reputational gauges of member contribution (“karma” on Reddit, for example) as well as personal relationships that develop among the most frequent users. As time goes on and you get more and more involved, you may find a greater number of your submissions doing well for these reasons.
Alexa Internet’s rankings are very simple. Every time a page on your site is viewed by someone who has their browser equipped with the Alexa toolbar, a hit is registered. It assigns rankings based on those hits (pageviews) and on “reach” (the percentage of all Internet users who have visited the site in question), averaged over the previous three months.
The website says that there is additional “data obtained from other, diverse traffic data sources,” but consensus seems to be that the toolbar is far and away the primary factor. And although there is some amount of data correction for potential biases, this has led to some controversy over how well the self-selected group of people who use the toolbar represents the entirety of Web users. The company’s disclaimers even include a note about data being less accurate for low-traffic sites: “the size of the Web and concentration of users on the most popular sites make it difficult to accurately determine the ranking of sites with fewer than 1,000 monthly visitors. Generally, traffic rankings of 100,000 and above should be regarded as not reliable.”
Despite the controversy, Alexa rankings are important to many advertisers and are at least accurate enough to be meaningful.
In spite of the myriad “## ways to boost your Alexa rank” blog posts around the Web, there is no easy way to improve your standings. Alexa Product Manager Geoffrey Mack dispelled most of the rumors a while ago between a post on the Alexa Web Discovery Machine (an official company blog) and a comment on the Online Money Making blog. All you can really do to increase your performance here—other than attract more traffic via other means—is to encourage your loyal readers to install the toolbar, perhaps displaying a widget as a reminder.
If you’re not listed on Alexa, you can either simply visit your site with the Alexa toolbar installed or submit it here.
If you’ve looked at the data for how visitors get to your site, you’ve probably noticed that a lot of people find you through search engines. The major engines, led by Google, drive a massive amount of Web traffic these days. Most people have a few favorite sites that they visit regularly by typing the familiar URL into the address bar or clicking a bookmark, but odds are good that search engines are your gateway to most of the rest of the Web.
Search engines work by crawling, indexing, and then using that index to serve Web data in response to users’ search queries.
A “crawler” is a program that periodically visits many millions of pages, gathering data about their text, text location, headings, and so on. It essentially tries to figure out what each page is all about. That data is then “indexed,” or stored in a searchable database. So the first thing to note here is that the search engines have to know about your site in order to crawl it. To inform them of your existence, follow these links: submit to Google, submit to Yahoo!, submit to MSN Live Search, and if you have a blog, submit to Technorati.
When you search for something, the engine will look at its huge index and return a list of results in order of relevance. Through your own common practice you probably know the importance of this search result ordering. In what percentage of searches do you actually get past the first few results? For most people, it’s rare, and even rarer to see more than the first couple pages.
None of the big engines make public exactly how they organize and prioritize search results. Google, for example, uses over 200 factors to determine which sites in its index are most relevant to the search query, and according to the New York Times, “makes about a half-dozen major and minor changes a week to the vast nest of mathematical formulas that power the search engine.” Publicizing all of those factors would make the system too easy to manipulate and duplicate.
Not all of these signals are unknown, though. Some are disclosed by the company, many are common sense to the Web savvy, and a few have been determined via trial and error and data studies. Examples include the weighting of titles and headings over other content, location of search terms on the page, how close the words in the search string are to each other, and so on.
If you perform a Google search for “muffins” you’re going to get millions of hits. To some extent, the results will be based on relevance. But how does Google know if you are looking for recipes, a definition, the name of a restaurant, or a YouTube video with that name? Search engines then have to apply a sort of “importance” factor. The idea is that, with equal relevance, a very popular site will probably satisfy a user’s request more often than an unknown.
These importance rankings tend to be what advertisers care about most. The main reason for this is that performing searches to see where in the results pages a particular site appears for a variety of applicable keywords just isn’t practical. Rankings are quantifiable, more static, and simple values that have a significant enough relationship with traffic and search result priority to merit a direct relationship with advertising dollars.
The simplest algorithm is Technorati’s Authority system. For every blog that has linked to you (and also pings Technorati) in the last six months, your authority increases by one. It doesn’t matter if they link to you once or a hundred times, one blog equals one authority point. When searching with Technorati, users can choose to filter search results based on authority (for example, only including those with high scores). Authority is also displayed in the form of Technorati Rank, which is just a list of blogs arranged by authority (the number 1 ranked blog has the most authority).
PageRank is Google’s way of quantifying the importance of a site within the link structure of the Web. Generally speaking, the more a page is linked to, the higher its “importance,” and thus the higher its position in the search results. On top of that, incoming links from sites that are themselves considered important carry more weight than links from less important sites. A link to you from nytimes.com, for example, will count more than a link from somewhere on Geocities.
The raw PageRank value is equal to the likelihood that someone randomly clicking on links anywhere on the Web will end up on the page (with a 15% chance of a truly random link). But you won’t see this raw value published anywhere; instead you can find a score between 0 and 10 for any site you visit with the Google Toolbar. The higher the number, the more important the site.
A common misconception is that PageRank determines search result order. Actually, it’s just a weight (albeit a seemingly heavy one), acting as just one of the aforementioned 200+ signals.
[Note: Yahoo!, Ask, and MSN Live Search all function similarly to Google, with similar processes to PageRank, but don’t seem to come into play as much for marketers. Keep in mind that while this post may reference Google far more often than the others, most anything you do to increase your Google performance will likely yield better results in its competition, too.]
Search Engine Optimization (SEO)
SEO is the process of trying to improve the amount or quality of search engine traffic. It’s become a big part of Internet marketing as businesses have found that a place at the top of organic search results is far more effective than most any advertising they could buy—and unless you pay an individual or SEO firm to do it for you, it’s free. “Organic search results” means those that aren’t advertisements or otherwise sponsored results.
Some of the main areas of SEO concentration are keywords, code, and design/presentation. After a little explanation of keywords, we’ll give some examples of the other two.
Let’s say you run a blog called The Bread Zone that’s all about sandwiches: where to get the best peanut butter and jelly, personal recipes, history, and so on. Now put yourself in the position of a potential reader/customer. What would you search for to find the information on your site? Certainly not “The Bread Zone.” It’s nice if you’re at the top of a search for your blog’s name, but if somebody knows the name of your site and is searching for it, it doesn’t matter all that much where in the listings you are. The idea here is to associate your site with certain words or phrases so that if someone does a search for, say, “best sandwiches Boston” or “Elvis’s sandwich”, you’ll be near the top.
The more your keywords are used, and the more prominent they are (i.e. in titles and headings), the more likely a search engine will be to associate them with your site.
Keyword associations evolve naturally, based on the sorts of things you write about. If you write a lot about the best sandwiches in Boston, you won’t be able to help Google automatically considering you more relevant for searches containing related terms.
The first thing you can do to let search engines know what sorts of words will be relevant to your site is to put them in your meta keywords tag. Unfortunately, while meta keywords used to be vital, their importance been severely weakened in response to rampant abuse by people using misleading information or entering a huge number of keywords (a tactic called “stuffing”).
“Keyword optimization” involves careful use and placement of the chosen keywords, aggressively pushing the associations. If you decide to attempt this, keep your writing organic! Be very careful not to get involved with practices that improve search results at the expense of content quality. It will backfire. Shoehorning “Elvis’s sandwich” into a bunch of titles, headings, text, links, templates, and captions without duplicating the rest of the content may be in line with what the search engines look for, so you might see a temporary boost in traffic, but then it will collapse. First your audience will fade because nobody will want to look at or read such poorly-composed text. Then because all the search engines have some sort of measure to catch this manipulative practice (another form of “keyword stuffing”), they will devalue your URL if not remove it from the index completely.
Even if you’ll be stopping at setting your meta tags, spend a little while really thinking about what words and phrases you anticipate using often anyway. Go as specific as possible, as long as it’s something you can see yourself writing a lot. Remember that crawlers don’t actually read English, so using the keyword “astrophysics,” even if it’s an accurate representation of the site’s content, is meaningless if the word itself doesn’t appear in the text. Also, “sandwich, bread, blog” isn’t going to help you a whole lot either. Even if they pop up often in your articles, those are such generic terms that you’ll be lucky to appear on page 10.
For an easy way to track how often you use potential keywords use the Webconfs Keyword Density Checker. It will display a word cloud, count how many times you use each, and calculate keyword density (keyword occurrence/total words, not including stop words like “the” or “and”). The ideal density for each keyword is somewhere between 1% and 6%.
Once you have some keyword ideas, you can throw them into Google’s Keyword Tool to analyze how often people search using those keywords, as well as some more specific alternatives.
SEOChat has one of the better articles out there about Choosing and Researching Keywords. It’s several years old, but still highly relevant.
Search-friendly coding issues
Look at the top of your browser window. Right now it probably says either “Center for Citizen Media: Blog” or “Center for Citizen Media: Blog » Blog Archive » Citizen Media Business Issues: Traffic Rankings, Search Engines, and Optimization:” (perhaps abbreviated), followed by the name of your browser. This is the title, and every single page on the web has one. Titles, because they are by nature relevant to the page’s content, are a big deal to search engines. Try to use a different, appropriate title for each page you create. Many blog hosts will do this automatically. A good format is to use the name of your site for the main page, and then on other pages use something like “[name of your site] – [name of specific page or post]”.
Alt attributes and picture names
Search engines don’t see pictures. Not even Google Image Search sees them in the way humans do. It looks at the surrounding text and page content, the name of the picture file, and the “alt” attribute (see below for an explanation of this), but it has no way to see the graphic and understand its contents. Many blog and web hosts’ uploading programs will rename the files you upload to some random combination of letters and numbers, so you may not have total control over that, but it’s easy to use alt attributes effectively. “Alt” is what would be displayed in place of the image if you were using a text-only browser. It is also what users of screen readers would hear (software for the blind that speaks aloud the words on a page). In some browsers, when the title attribute is missing (not to be confused with the title of the page), it is also what pops up when you hover your cursor over an image.
Some publishing software gives you an “alt attribute” field to fill in when you create a link; if not, it’s just a matter of looking at the link’s code. Find the appropriate line with an “IMG SRC” html tag. You’ll see the IMG SRC is equal to the URL of an image file. All you have to do to is add alt = “the alt text goes here” after the URL and before the close bracket. When you’re done it’ll look like:
<IMG SRC=”http://www.yoursite.com/images/sample.jpg” alt=”your alt text” >
The alt text should act as a substitute for the image’s existence, which doesn’t always mean a description of the picture’s contents. A mailbox icon, for example, is probably better described as “e-mail [your name or site]” than “blue mailbox with a letter sticking out and the red flag raised.”
Also, don’t use alt text that is redundant to adjacent article text. And it’s probably best not to use alt tags for graphics that are purely for decoration (frilly borders and such), as they don’t actually provide anything relevant to the primary content.
From the Sitemaps website:
Sitemaps are an easy way for webmasters to inform search engines about pages on their sites that are available for crawling. In its simplest form, a Sitemap is an XML file that lists URLs for a site along with additional metadata about each URL (when it was last updated, how often it usually changes, and how important it is, relative to other URLs in the site) so that search engines can more intelligently crawl the site.
The easiest way to create a sitemap is to use XML-Sitemaps.com. Just enter your URL, wait for it to scan, download the file, and place it in the root folder of your website (so that its address is www.yoursite.com/sitemap.xml). If you use a blog that doesn’t let you upload files like this, the next best option is to use your RSS feed as a sitemap (usually www.yoursite.com/rss.xml).
Edit your robots.txt (a file on your website that contains instructions for crawlers and other robots) to include the line
where [sitemap_location] is the complete URL of your sitemap. This is the only way certain search engines, like MSN, will find it. For more information about robots.txt, see the Web Robots Pages.
To add your sitemap to Google, just sign up for Google Webmaster Tools and click “add” under the Sitemap column.
Adding your sitemap to Ask is particularly important, since the Ask crawler isn’t quite as active as the others and there is no way to simply submit a URL to be crawled. To do so, just copy and paste this link, replacing the end with the location of your sitemap: http://submissions.ask.com/ping?sitemap=http%3A//www.the URL of your sitemap here.xml
If your site is located at www.yourdomain.com, a redirect is required to make sure people can get to you when they type yourdomain.com (omitting the www) into their address bar. Your domain registrar has probably already set this up for you, but check with them to make sure this is a “301 redirect.” This tells search engines that the redirect is permanent, not temporary.
A fast way to find out which type of redirect you have is to use the SEOmoz HTTP status code checker.
Search-friendly design and presentation issues
Search engines are not smart enough to be able to see HTML frames as part of a single page. So when analyzed separately, each is weaker. The main content frame has fewer links and the navigation frame (assuming a cliché frames layout) will more than likely be discounted altogether for consisting only of links.
Frames made with CSS are better, because you can do it without creating separate files.
Java and Flash
Search engines have duplicate content filters so as not to serve the end user a bunch of identical results, and also to prevent someone from copying and pasting good text from a popular site like Wikipedia onto a dummy page where it is passed off as original content in order to bring in ad money. Therefore don’t repost articles, avoid overusing boilerplates, etc.
Avoiding duplicate content doesn’t apply to navigation. In fact, having a good navigation system is very helpful as it creates a dense web of internal links, which makes it easier for the crawler to find all of your files. A crawler might not find something that requires several clicks to access.
Search engines want to serve the highest quality, most relevant results to users. Since links are such a huge factor when determining search results, there are many ways in which yours can help or hurt you.
Fix broken links. If you use DreamWeaver or FrontPage, you can check for them from inside the program. If not, you can use a tool like Webmaster Toolkit’s Link Checker or iWebTool Broken Link Checker to scan for them.
Don’t link to bad neighborhoods (spammers, abusers of search engines, sites that install malware…use common sense). Inbound links from such sites won’t hurt you, though, since you are assumed not to have control over it and they won’t affect a user’s experience on your page.
Watch the number of outbound links on a given page and try to stay under 100. More than that and you may trigger a link spam filter (more on link spam in the Warnings section below).
Consistent and relevant anchor text
When you create a link on your page, you typically don’t display the full URL in your article. A link to the Wikipedia article on frogs, for example, will usually look more like <a href=”http://en.wikipedia.org/wiki/Frog”frog than http://en.wikipedia.org/wiki/Frog. These words that you show the reader to describe the link, which at the same time hide the unattractive URL, are called anchor text.
Anchor text is very important to search engines because it’s often very relevant to the page it refers to. The more often Wikipedia’s frog page is referred to as “frog,” the more search engines associate that particular page with that word. Knowing that, what do you think happens when someone links to you using nondescript text like “this post,” “here,” “a section,” or “great article”? Unfortunately, it doesn’t mean that when somebody Googles “great article,” your page will come up. These terms are much too widely used, and aren’t informative so the anchor text ends up meaning nothing.
So use relevant anchor text whenever you can, especially for your own content. Be consistent, too. Use exact titles or the same keywords when linking to your other articles. However, going out of your way to shoehorn detailed anchor text for every outbound link isn’t always practical. It can cause some awkward sentences and take up more space than you want, but rewording whenever you can is good practice.
Google Webmaster Tools will allow you to view the most common words in anchor text linking to you.
Submit your site to the Open Directory Project and the Yahoo! Directory. These are not search engines and optimization isn’t a factor here, but many people use these human-organized and edited databases. Also, the data is used to generate search engine results, especially the Yahoo! Directory, which is drawn from for every Yahoo! search. They don’t accept everybody, but if your content is unique or very good, and if you pick the right category to add it to, you have a good chance.
Spamdexing means to manipulate search results in a way that’s not in line with the way the search engine is supposed to work. At the beginning of this post there was a disclaimer that the information contained herein would not be concerned with unscrupulous forms of search engine optimization. Here we’ll detail a few of the most common such tricks. These are not just dishonest; they’re unsound and will backfire.
Google, et al, know about these scams and have been tweaking defenses against them for years. Most of these are now detectable automatically via the crawler, in which case the offending site will be significantly devalued. In more drastic instances, the site will be removed permanently from the index. The few of these that do work still violate the search engines’ terms of service and, if reported or manually caught, will result in similar disciplinary action.
In 2006, head of Google’s webspam group Matt Cutts confirmed on his blog, for the first time, a company’s removal from Google’s index: “Google has removed traffic-power.com and domains promoted by Traffic Power from our index because of search engine optimization techniques that violated our webmaster guidelines at http://www.google.com/webmasters/guidelines.html.” Companies had been removed before, but this was the first noted by name. The impetus for the disclosure was to support the defendants in a defamation lawsuit filed by Traffic Power against its critics. Banning not just Traffic Power, but also its clients makes it clear that Google has no qualms about squashing sites that use shady SEO practices, intentional or not. According to Google, you are held responsible for SEO violations relating to your site, whether implemented by you or by an SEO professional you’ve hired.
So these warnings are not only for those with flexible ethics. There are many well-meaning webmasters that can accidentally fall into these traps, unaware that they are doing something wrong or raising a red flag.
Note: If you have been wrongly removed from Google, there is a reinclusion process you can go through. However, it can take some time to get your page restored to the search results.
As we know, links matter when determining a site’s importance and relevance. Link spamming is creating links without merit.
One form this takes is that of a link farm, a community of web pages that all link to each other in order to boost search engine presence.
Similar is a link train, which acts like a sort of web-based pyramid scheme in which you place your link at the bottom of a list, publish it, and get other people to do the same.
Some people will create multiple websites themselves just to link back and forth and boost the primary site.
Others will place links in the comments of dozens or hundreds of blogs, often irrelevant or without reading the post itself.
A little less common, but still disallowed, is the practice of paying for links at a popular site to be displayed in such a way that they don’t look like advertisements.
Mentioned briefly before, this is when you place a very large number of keywords:
- in the meta keywords tag
- together at the bottom of a page
- in the article text
- pretty much anywhere else
- HubSpot’s Website Grader is one of the most complete free SEO and marketing statistic tools on the Web. Just type in a URL, and after a few moments it’ll spit out a detailed report covering everything from PageRank and Technorati rank to recommendations for how to improve your use of metadata. If you’re looking for a simpler tool that will display the number of backlinks and pages indexed by the big three engines, use the Webconfs Domain Stats tool.
- If you want to learn more about the variables search engines look for, SEOmoz put out a really valuable article in which they compiled the responses from 37 SEO professionals about the importance of many factors.
- SEOTools has a couple useful Firefox add-ons called SEO for Firefox and Rank Checker. SEO for Firefox, when enabled, supplements a normal Google search with a heap of data underneath each hit. The idea is that it allows you to figure out why a page is ranked where it is. There’s nothing here that you don’t get from HubSpot’s Website Grader, but it can be useful to have the information displayed right under search results. Rank Checker can automate periodic keyword analysis. For example, find out where a particular page ranks on any of the three major engines for one or a number of keywords. Then store the information so you can see how it has changed over time.
- If all this is a bit much, but your website is important enough to you that you would like it to be optimized, Google put together a guide to hiring a professional SEO.
- Official information for developers about each site:
Only use a few in the meta tag, and for everything else allow only whatever comes from natural writing.
Invisible text, single pixel links, 0-width DIVs, and tiny fonts
These are all tactics people use to hide link spam or keyword stuffing, attempting to get the SEO benefit without looking sloppy.
Some SEO-related links
(Ryan McGrady is a new media graduate student at Emerson College where he is studying knowledge, identity, and ideas in the information age.)