SEO ARTICLES
SEO Tutorial
Do it yourself SEO
Search Engine Optimization
Google PageRank
SEO Tips
Start now - Completely free |
SEO Tutorial
Introduction to seo
1. General seo information
1.1 History of search engines
1.2 Common search engine principles
2. Internal ranking factors
2.1 Web page layout factors relevant to seo
2.1.1 Amount of text on a pageå
2.1.2 Number of keywords on a page
2.1.3 Keyword density and seo
2.1.4 Location of keywords on a page
2.1.5 Text format and seo
2.1.6 «TITLE» tag
2.1.7 Keywords in links
2.1.8 «ALT» attributes in images
2.1.9 Description Meta tag
2.1.10 Keywords Meta tag
2.2 Site structure
2.2.1 Number of pages
2.2.2 Navigation menu
2.2.3 Keywords in page names
2.2.4 Avoid subdirectories
2.2.5 One page – one keyword phrase
2.2.6 Seo and the Main page
2.3 Common seo mistakes
2.3.1 Graphic header
2.3.2 Graphic navigation menu
2.3.3 Script navigation
2.3.4 Session identifier
2.3.5 Redirects
2.3.6 Hidden text, a deceptive seo method
2.3.7 One-pixel links, seo deception
3 External ranking factors
3.1 Why inbound links to sites are taken into account
3.2 Link importance (citation index)
3.3 Link text (anchor text)
3.4 Relevance of referring pages
3.5 Google PageRank – theoretical basics
3.6 Google PageRank – practical use
3.7 Increasing link popularity
3.7.1 Submitting to general purpose directories
3.7.2 DMOZ directory
3.7.3 Link exchange
3.7.4 Press releases, news feeds, thematic resources
4 Indexing a site
5 Choosing keywords
5.1 Initially choosing keywords
5.2 Frequent and rare keywordsè
5.3 Evaluating the competition rates of search queries
5.4 Refining your keyword phrases
6 Miscellaneous information on search engines
6.1 Google SandBox
6.2 Google LocalRank
6.3 Tips, assumptions, observations
6.4 Creating correct content
6.5 Selecting a domain and hosting
6.6 Changing the site address
Introduction to seo
This document is intended for webmasters and site owners who want to investigate
the issues of seo (search engine optimization) and promotion of their resources.
It is mainly aimed at beginners, although I hope that experienced webmasters will
also find something new and interesting here. There are many articles on seo on
the Internet and this text is an attempt to gather some of this information into
a single consistent document.
Information presented in this text can be divided into several parts:
- Clear-cut seo recommendations, practical guidelines.
- Theoretical information that we think any seo specialist should know.
- Seo tips, observations, recommendations from experience, other seo sources, etc.
1. General seo information
1.1 History of search engines
In the early days of Internet development, its users were a privileged minority
and the amount of available information was relatively small. Access was mainly
restricted to employees of various universities and laboratories who used it to
access scientific information. In those days, the problem of finding information
on the Internet was not nearly as critical as it is now.
Site directories were one of the first methods used to facilitate access to information
resources on the network. Links to these resources were grouped by topic. Yahoo
was the first project of this kind opened in April 1994. As the number of sites
in the Yahoo directory inexorably increased, the developers of Yahoo made the directory
searchable. Of course, it was not a search engine in its true form because searching
was limited to those resources who’s listings were put into the directory. It did
not actively seek out resources and the concept of seo was yet to arrive.
Such link directories have been used extensively in the past, but nowadays they
have lost much of their popularity. The reason is simple – even modern directories
with lots of resources only provide information on a tiny fraction of the Internet.
For example, the largest directory on the network is currently DMOZ (or Open Directory
Project). It contains information on about five million resources. Compare this
with the Google search engine database containing more than eight billion documents.
The WebCrawler project started in 1994 and was the first full-featured search engine.
The Lycos and AltaVista search engines appeared in 1995 and for many years Alta
Vista was the major player in this field.
In 1997 Sergey Brin and Larry Page created Google as a research project at Stanford
University. Google is now the most popular search engine in the world.
Currently, there are three leading international search engines – Google, Yahoo
and MSN Search. They each have their own databases and search algorithms. Many other
search engines use results originating from these three major search engines and
the same seo expertise can be applied to all of them. For example, the AOL search
engine (search.aol.com) uses the Google database while AltaVista, Lycos and AllTheWeb
all use the Yahoo database.
1.2 Common search engine principles
To understand seo you need to be aware of the architecture of search engines. They
all contain the following main components:
Spider - a browser-like program that downloads web pages.
Crawler – a program that automatically follows all of the links on each web
page.
Indexer - a program that analyzes web pages downloaded by the spider and
the crawler.
Database– storage for downloaded and processed pages.
Results engine – extracts search results from the database.
Web server – a server that is responsible for interaction between the user
and other search engine components.
Specific implementations of search mechanisms may differ. For example, the Spider+Crawler+Indexer
component group might be implemented as a single program that downloads web pages,
analyzes them and then uses their links to find new resources. However, the components
listed are inherent to all search engines and the seo principles are the same.
Spider. This program downloads web pages just like a web browser. The difference
is that a browser displays the information presented on each page (text, graphics,
etc.) while a spider does not have any visual components and works directly with
the underlying HTML code of the page. You may already know that there is an option
in standard web browsers to view source HTML code.
Crawler. This program finds all links on each page. Its task is to determine
where the spider should go either by evaluating the links or according to a predefined
list of addresses. The crawler follows these links and tries to find documents not
already known to the search engine.
Indexer. This component parses each page and analyzes the various elements,
such as text, headers, structural or stylistic features, special HTML tags, etc.
Database. This is the storage area for the data that the search engine downloads
and analyzes. Sometimes it is called the index of the search engine.
Results Engine. The results engine ranks pages. It determines which
pages best match a user's query and in what order the pages should be listed. This
is done according to the ranking algorithms of the search engine. It follows that
page rank is a valuable and interesting property and any seo specialist is most
interested in it when trying to improve his site search results. In this article,
we will discuss the seo factors that influence page rank in some detail.
Web server. The search engine web server usually contains a HTML page with
an input field where the user can specify the search query he or she is interested
in. The web server is also responsible for displaying search results to the user
in the form of an HTML page.
2. Internal ranking factors
Several factors influence the position of a site in the search results. They can
be divided into external and internal ranking factors. Internal ranking factors
are those that are controlled by seo aware website owners (text, layout, etc.) and
will be described next.
2.1 Web page layout factors relevant to seo
2.1.1 Amount of text on a page
A page consisting of just a few sentences is less likely to get to the top of a
search engine list. Search engines favor sites that have a high information content.
Generally, you should try to increase the text content of your site in the interest
of seo. The optimum page size is 500-3000 words (or 2000 to 20,000 characters).
Search engine visibility is increased as the amount of page text increases due to
the increased likelihood of occasional and accidental search queries causing it
to be listed. This factor sometimes results in a large number of visitors.
2.1.2 Number of keywords on a page
Keywords must be used at least three to four times in the page text. The upper limit
depends on the overall page size – the larger the page, the more keyword repetitions
can be made. Keyword phrases (word combinations consisting of several keywords)
are worth a separate mention. The best seo results are observed when a keyword phrase
is used several times in the text with all keywords in the phrase arranged in exactly
the same order. In addition, all of the words from the phrase should be used separately
several times in the remaining text. There should also be some difference (dispersion)
in the number of entries for each of these repeated words.
Let us take an example. Suppose we optimize a page for the phrase "seo software”
(one of our seo keywords for this site) It would be good to use the phrase “seo
software” in the text 10 times, the word “seo” 7 times elsewhere in the text and
the word “software” 5 times. The numbers here are for illustration only, but they
show the general seo idea quite well.
2.1.3 Keyword density and seo
Keyword page density is a measure of the relative frequency of the word in the text
expressed as a percentage. For example, if a specific word is used 5 times on a
page containing 100 words, the keyword density is 5%. If the density of a keyword
is too low, the search engine will not pay much attention to it. If the density
is too high, the search engine may activate its spam filter. If this happens, the
page will be penalized and its position in search listings will be deliberately
lowered.
The optimum value for keyword density is 5-7%. In the case of keyword phrases, you
should calculate the total density of each of the individual keywords comprising
the phrases to make sure it is within the specified limits. In practice, a keyword
density of more than 7-8% does not seem to have any negative seo consequences. However,
it is not necessary and can reduce the legibility of the content from a user’s viewpoint.
2.1.4 Location of keywords on a page
A very short rule for seo experts – the closer a keyword or keyword phrase is to
the beginning of a document, the more significant it becomes for the search engine.
2.1.5 Text format and seo
Search engines pay special attention to page text that is highlighted or given special
formatting. We recommend:
- use keywords in headings. Headings are text highlighted with the «H» HTML tags.
The «h1» and «h2» tags are most effective. Currently, the use of CSS allows you
to redefine the appearance of text highlighted with these tags. This means that
«H» tags are used less than nowadays, but are still very important in seo work.;
- Highlight keywords with bold fonts. Do not highlight the entire text! Just highlight
each keyword two or three times on the page. Use the «strong» tag for highlighting
instead of the more traditional «B» bold tag.
2.1.6 «TITLE» tag
This is one of the most important tags for search engines. Make use of this fact
in your seo work. Keywords must be used in the TITLE tag. The link to your site
that is normally displayed in search results will contain text derived from the
TITLE tag. It functions as a sort of virtual business card for your pages. Often,
the TITLE tag text is the first information about your website that the user sees.
This is why it should not only contain keywords, but also be informative and attractive.
You want the searcher to be tempted to click on your listed link and navigate to
your website. As a rule, 50-80 characters from the TITLE tag are displayed in search
results and so you should limit the size of the title to this length.
2.1.7 Keywords in links
A simple seo rule – use keywords in the text of page links that refer to other pages
on your site and to any external Internet resources. Keywords in such links can
slightly enhance page rank.
2.1.8 «ALT» attributes in images
Any page image has a special optional attribute known as "alternative text.” It
is specified using the HTML «ALT» tag. This text will be displayed if the browser
fails to download the image or if the browser image display is disabled. Search
engines save the value of image ALT attributes when they parse (index) pages, but
do not use it to rank search results.
Currently, the Google search engine takes into account text in the ALT attributes
of those images that are links to other pages. The ALT attributes of other images
are ignored. There is no information regarding other search engines, but we can
assume that the situation is similar. We consider that keywords can and should be
used in ALT attributes, but this practice is not vital for seo purposes.
2.1.9 Description Meta tag
This is used to specify page descriptions. It does not influence the seo ranking
process but it is very important. A lot of search engines (including the largest
one – Google) display information from this tag in their search results if this
tag is present on a page and if its content matches the content of the page and
the search query.
Experience has shown that a high position in search results does not always guarantee
large numbers of visitors. For example, if your competitors' search result description
is more attractive than the one for your site then search engine users may choose
their resource instead of yours. That is why it is important that your Description
Meta tag text be brief, but informative and attractive. It must also contain keywords
appropriate to the page.
2.1.10 Keywords Meta tag
This Meta tag was initially used to specify keywords for pages but it is hardly
ever used by search engines now. It is often ignored in seo projects. However, it
would be advisable to specify this tag just in case there is a revival in its use.
The following rule must be observed for this tag: only keywords actually used in
the page text must be added to it.
2.2 Site structure
2.2.1 Number of pages
The general seo rule is: the more, the better. Increasing the number of pages on
your website increases the visibility of the site to search engines. Also, if new
information is being constantly added to the site, search engines consider this
as development and expansion of the site. This may give additional advantages in
ranking. You should periodically publish more information on your site – news, press
releases, articles, useful tips, etc.
2.2.2 Navigation menu
As a rule, any site has a navigation menu. Use keywords in menu links, it will give
additional seo significance to the pages to which the links refer.
2.2.3 Keywords in page names
Some seo experts consider that using keywords in the name of a HTML page file may
have a positive effect on its search result position.
2.2.4 Avoid subdirectories
If there are not too many pages on your site (up to a couple of dozen), it is best
to place them all in the root directory of your site. Search engines consider such
pages to be more important than ones in subdirectories.
2.2.5 One page – one keyword phrase
For maximum seo try to optimize each page for its own keyword phrase. Sometimes
you can choose two or three related phrases, but you should certainly not try to
optimize a page for 5-10 phrases at once. Such phrases would probably produce no
effect on page rank.
2.2.6 Seo and the Main page
Optimize the main page of your site (domain name, index.html) for word combinations
that are most important. This page is most likely to get to the top of search engine
lists. My seo observations suggest that the main page may account for up to 30-40%
percent of the total search traffic for some sites
2.3 Common seo mistakes
2.3.1 Graphic header
Very often sites are designed with a graphic header. Often, we see an image of the
company logo occupying the full-page width. Do not do it! The upper part of a page
is a very valuable place where you should insert your most important keywords for
best seo. In case of a graphic image, that prime position is wasted since search
engines can not make use of images. Sometimes you may come across completely absurd
situations: the header contains text information, but to make its appearance more
attractive, it is created in the form of an image. The text in it cannot be indexed
by search engines and so it will not contribute toward the page rank. If you must
present a logo, the best way is to use a hybrid approach – place the graphic logo
at the top of each page and size it so that it does not occupy its entire width.
Use a text header to make up the rest of the width.
2.3.2 Graphic navigation menu
The situation is similar to the previous one – internal links on your site should
contain keywords, which will give an additional advantage in seo ranking. If your
navigation menu consists of graphic elements to make it more attractive, search
engines will not be able to index the text of its links. If it is not possible to
avoid using a graphic menu, at least remember to specify correct ALT attributes
for all images.
2.3.3 Script navigation
Sometimes scripts are used for site navigation. As an seo worker, you should understand
that search engines cannot read or execute scripts. Thus, a link specified with
the help of a script will not be available to the search engine, the search robot
will not follow it and so parts of your site will not be indexed. If you use site
navigation scripts then you must provide regular HTML duplicates to make them visible
to everyone – your human visitors and the search robots.
2.3.4 Session identifier
Some sites use session identifiers. This means that each visitor gets a unique parameter
(&session_id=) when he or she arrives at the site. This ID is added to the address
of each page visited on the site. Session IDs help site owners to collect useful
statistics, including information about visitors' behavior. However, from the point
of view of a search robot, a page with a new address is a brand new page. This means
that, each time the search robot comes to such a site, it will get a new session
identifier and will consider the pages as new ones whenever it visits them.
Search engines do have algorithms for consolidating mirrors and pages with the same
content. Sites with session IDs should, therefore, be recognized and indexed correctly.
However, it is difficult to index such sites and sometimes they may be indexed incorrectly,
which has an adverse effect on seo page ranking. If you are interested in seo for
your site, I recommend that you avoid session identifiers if possible.
2.3.5 Redirects
Redirects make site analysis more difficult for search robots, with resulting adverse
effects on seo. Do not use redirects unless there is a clear reason for doing so.
2.3.6 Hidden text, a deceptive seo method
The last two issues are not really mistakes but deliberate attempts to deceive search
engines using illicit seo methods. Hidden text (when the text color coincides with
the background color, for example) allows site owners to cram a page with their
desired keywords without affecting page logic or visual layout. Such text is invisible
to human visitors but will be seen by search robots. The use of such deceptive optimization
methods may result in banning of the site. It could be excluded from the index (database)
of the search engine.
2.3.7 One-pixel links, seo deception
This is another deceptive seo technique. Search engines consider the use of tiny,
almost invisible, graphic image links just one pixel wide and high as an attempt
at deception, which may lead to a site ban.
3 External ranking factors
3.1 Why inbound links to sites are taken into account
As you can see from the previous section, many factors influencing the ranking process
are under the control of webmasters. If these were the only factors then it would
be impossible for search engines to distinguish between a genuine high-quality document
and a page created specifically to achieve high search ranking but containing no
useful information. For this reason, an analysis of inbound links to the page being
evaluated is one of the key factors in page ranking. This is the only factor that
is not controlled by the site owner.
It makes sense to assume that interesting sites will have more inbound links. This
is because owners of other sites on the Internet will tend to have published links
to a site if they think it is a worthwhile resource. The search engine will use
this inbound link criterion in its evaluation of document significance.
Therefore, two main factors influence how pages are stored by the search engine
and sorted for display in search results:
- Relevance, as described in the previous section on internal ranking factors.
- Number and quality of inbound links, also known as link citation, link popularity
or citation index. This will be described in the next section.
3.2 Link importance (citation index, link popularity)
You can easily see that simply counting the number of inbound links does not give
us enough information to evaluate a site. It is obvious that a link from www.microsoft.com
should mean much more than a link from some homepage like www.hostingcompany.com/~myhomepage.html.
You have to take into account link importance as well as number of links.
Search engines use the notion of citation index to evaluate the number and quality
of inbound links to a site. Citation index is a numeric estimate of the popularity
of a resource expressed as an absolute value representing page importance. Each
search engine uses its own algorithms to estimate a page citation index. As a rule,
these values are not published.
As well as the absolute citation index value, a scaled citation index is sometimes
used. This relative value indicates the popularity of a page relative to the popularity
of other pages on the Internet. You will find a detailed description of citation
indexes and the algorithms used for their estimation in the next sections.
3.3 Link text (anchor text)
The link text of any inbound site link is vitally important in search result ranking.
The anchor (or link) text is the text between the HTML tags «A» and «/A» and is
displayed as the text that you click in a browser to go to a new page. If the link
text contains appropriate keywords, the search engine regards it as an additional
and highly significant recommendation that the site actually contains valuable information
relevant to the search query.
3.4 Relevance of referring pages
As well as link text, search engines also take into account the overall information
content of each referring page.
Example: Suppose we are using seo to promote a car sales resource. In this case a
link from a site about car repairs will have much more importance that a similar
link from a site about gardening. The first link is published on a resource having
a similar topic so it will be more important for search engines.
3.5 Google PageRank – theoretical basics
The Google company was the first company to patent the system of taking into account
inbound links. The algorithm was named PageRank. In this section, we will describe
this algorithm and how it can influence search result ranking.
PageRank is estimated separately for each web page and is determined by the PageRank
(citation) of other pages referring to it. It is a kind of “virtuous circle.” The
main task is to find the criterion that determines page importance. In the case
of PageRank, it is the possible frequency of visits to a page.
I shall now describe how user’s behavior when following links to surf the network
is modeled. It is assumed that the user starts viewing sites from some random page.
Then he or she follows links to other web resources. There is always a possibility
that the user may leave a site without following any outbound link and start viewing
documents from a random page. The PageRank algorithm estimates the probability of
this event as 0.15 at each step. The probability that our user continues surfing
by following one of the links available on the current page is therefore 0.85, assuming
that all links are equal in this case. If he or she continues surfing indefinitely,
popular pages will be visited many more times than the less popular pages.
The PageRank of a specified web page is thus defined as the probability that a user
may visit the web page. It follows that, the sum of probabilities for all
existing web pages is exactly one because the user is assumed to be visiting at
least one Internet page at any given moment.
Since it is not always convenient to work with these probabilities the PageRank
can be mathematically transformed into a more easily understood number for viewing.
For instance, we are used to seeing a PageRank number between zero and ten on the
Google Toolbar.
According to the ranking model described above:
- Each page on the Net (even if there are no inbound links to it) initially has
a PageRank greater than zero, although it will be very small. There is a tiny chance
that a user may accidentally navigate to it.
- Each page that has outbound links distributes part of its PageRank to the referenced
page. The PageRank contributed to these linked-to pages is inversely proportional
to the total number of links on the linked-from page – the more links it has, the
lower the PageRank allocated to each linked-to page.
- PageRank A “damping factor” is applied to this process so that the total distributed
page rank is reduced by 15%. This is equivalent to the probability, described above,
that the user will not visit any of the linked-to pages but will navigate to an
unrelated website.
Let us now see how this PageRank process might influence the process of ranking
search results. We say “might” because the pure PageRank algorithm just described
has not been used in the Google algorithm for quite a while now. We will discuss
a more current and sophisticated version shortly. There is nothing difficult about
the PageRank influence – after the search engine finds a number of relevant documents
(using internal text criteria), they can be sorted according to the PageRank since
it would be logical to suppose that a document having a larger number of high-quality
inbound links contains the most valuable information.
Thus, the PageRank algorithm "pushes up" those documents that are most popular outside
the search engine as well.
3.6 Google PageRank – practical use
Currently, PageRank is not used directly in the Google algorithm. This is to be
expected since pure PageRank characterizes only the number and the quality of inbound
links to a site, but it completely ignores the text of links and the information
content of referring pages. These factors are important in page ranking and they
are taken into account in later versions of the algorithm. It is thought that the
current Google ranking algorithm ranks pages according to thematic PageRank. In
other words, it emphasizes the importance of links from pages with content related
by similar topics or themes. The exact details of this algorithm are known only
to Google developers.
You can determine the PageRank value for any web page with the help of the Google
ToolBar that shows a PageRank value within the range from 0 to 10. It should be
noted that the Google ToolBar does not show the exact PageRank probability value,
but the PageRank range a particular site is in. Each range (from 0 to 10) is defined
according to a logarithmic scale.
Here is an example: each page has a real PageRank value known only to Google. To
derive a displayed PageRank range for their ToolBar, they use a logarithmic scale
as shown in this table
Real PR ToolBar PR
1-10 1
10-100 2
100-1000 3
1000-10.000 4
Etc.
This shows that the PageRank ranges displayed on the Google ToolBar are not all
equal. It is easy, for example, to increase PageRank from one to two, while it is
much more difficult to increase it from six to seven.
In practice, PageRank is mainly used for two purposes:
1. Quick check of the sites popularity. PageRank does not give exact information
about referring pages, but it allows you to quickly and easily get a feel for the
sites popularity level and to follow trends that may result from your seo work.
You can use the following “Rule of thumb” measures for English language sites: PR
4-5 is typical for most sites with average popularity. PR 6 indicates a very popular
site while PR 7 is almost unreachable for a regular webmaster. You should congratulate
yourself if you manage to achieve it. PR 8, 9, 10 can only be achieved by the sites
of large companies such as Microsoft, Google, etc. PageRank is also useful when
exchanging links and in similar situations. You can compare the quality of the pages
offered in the exchange with pages from your own site to decide if the exchange
should be accepted.
2. Evaluation of the competitiveness level for a search query is a vital part of
seo work. Although PageRank is not used directly in the ranking algorithms, it allows
you to indirectly evaluate relative site competitiveness for a particular query.
For example, if the search engine displays sites with PageRank 6-7 in the top search
results, a site with PageRank 4 is not likely to get to the top of the results list
using the same search query.
It is important to recognize that the PageRank values displayed on the Google ToolBar
are recalculated only occasionally (every few months) so the Google ToolBar displays
somewhat outdated information. This means that the Google search engine tracks changes
in inbound links much faster than these changes are reflected on the Google ToolBar.
3.7 Increasing link popularity
3.7.1 Submitting to general purpose directories
On the Internet, many directories contain links to other network resources grouped
by topics. The process of adding your site information to them is called submission.
Such directories can be paid or free of charge, they may require a backlink from
your site or they may have no such requirement. The number of visitors to these
directories is not large so they will not send a significant number to your site.
However, search engines count links from these directories and this may enhance
your sites search result placement.
Important! Only those directories that publish a direct link to your site
are worthwhile from a seo point of view. Script driven directories are almost useless.
This point deserves a more detailed explanation. There are two methods for publishing
a link. A direct link is published as a standard HTML construction («A href=...»,
etc.). Alternatively, links can be published with the help of various scripts, redirects
and so on. Search engines understand only those links that are specified directly
in HTML code. That is why the seo value of a directory that does not publish a direct
link to your site is close to zero.
You should not submit your site to FFA (free-for-all) directories. Such directories
automatically publish links related to any search topic and are ignored by search
engines. The only thing an FFA directory entry will give you is an increase in spam
sent to your published e-mail address. Actually, this is the main purpose of FFA
directories.
Be wary of promises from various programs and seo services that submit your resource
to hundreds of thousands of search engines and directories. There are no more than
a hundred or so genuinely useful directories on the Net – this is the number to
take seriously and professional seo submission services work with this number of
directories. If a seo service promises submissions to enormous numbers of resources,
it simply means that the submission database mainly consists of FFA archives and
other useless resources.
Give preference to manual or semiautomatic seo submission; do not rely completely
on automatic processes. Submitting sites under human control is generally much more
efficient than fully automatic submission. The value of submitting a site to paid
directories or publishing a backlink should be considered individually for each
directory. In most cases, it does not make much sense, but there may be exceptions.
Submitting sites to directories does not often result in a dramatic effect on site
traffic, but it slightly increases the visibility of your site for search engines.
This useful seo option is available to everyone and does not require a lot of time
and expense, so do not overlook it when promoting your project.
3.7.2 DMOZ directory
The DMOZ directory (www.dmoz.org)
or the Open Directory Project is the largest directory on the Internet. There are
many copies of the main DMOZ site and so, if you submit your site to the DMOZ directory,
you will get a valuable link from the directory itself as well as dozens of additional
links from related resources. This means that the DMOZ directory is of great value
to a seo aware webmaster.
It is not easy to get your site into the DMOZ directory; there is an element of
luck involved. Your site may appear in the directory a few minutes after it has
been submitted or it may take months to appear.
If you submitted your site details correctly and in the appropriate category then
it should eventually appear. If it does not appear after a reasonable time then
you can try contacting the editor of your category with a question about your request
(the DMOZ site gives you such opportunity). Of course, there are no guarantees,
but it may help. DMOZ directory submissions are free of charge for all sites, including
commercial ones.
Here are my final recommendations regarding site submissions to DMOZ. Read all site
requirements, descriptions, etc. to avoid violating the submission rules. Such a
violation will most likely result in a refusal to consider your request. Please
remember, presence in the DMOZ directory is desirable, but not obligatory. Do not
despair if you fail to get into this directory. It is possible to reach top positions
in search results without this directory – many sites do.
3.7.3 Link exchange
The essence of link exchanges is that you use a special page to publish links to
other sites and get similar backlinks from them. Search engines do not like link
exchanges because, in many cases, they distort search results and do not provide
anything useful to Internet users. However, it is still an effective way to increase
link popularity if you observe several simple rules.
- Exchange links with sites that are related by topic. Exchanging links with unrelated
sites is ineffective and unpopular.
- Before exchanging, make sure that your link will be published on a “good” page.
This means that the page must have a reasonable PageRank (3-4 or higher is recommended),
it must be available for indexing by search engines, the link must be direct, the
total number of links on the page must not exceed 50, and so on.
- Do not create large link directories on your site. The idea of such a directory
seems attractive because it gives you an opportunity to exchange links with many
sites on various topics. You will have a topic category for each listed site. However,
when trying to optimize your site you are looking for link quality rather than quantity
and there are some potential pitfalls. No seo aware webmaster will publish a quality
link to you if he receives a worthless link from your directory “link farm” in return.
Generally, the PageRank of pages from such directories leaves a lot to be desired.
In addition, search engines do not like these directories at all. There have even
been cases where sites were banned for using such directories.
- Use a separate page on the site for link exchanges. It must have a reasonable
PageRank and it must be indexed by search engines, etc. Do not publish more than
50 links on one page (otherwise search engines may fail to take some of the links
into account). This will help you to find other seo aware partners for link exchanges.
- Search engines try to track mutual links. That is why you should, if possible,
publish backlinks on a domain/site other than the one you are trying to promote.
The best variant is when you promote the resource site1.com and publish backlinks
on the resource site2.com.
- Exchange links with caution. Webmasters who are not quite honest will often remove
your links from their resources after a while. Check your backlinks from time to
time.
3.7.4 Press releases, news feeds, thematic resources
This section is about site marketing rather than pure seo. There are many information
resources and news feeds that publish press releases and news on various topics.
Such sites can supply you with direct visitors and also increase your sites popularity.
If you do not find it easy to create a press release or a piece of news, hire copywriters
– they will help you find or create something newsworthy.
Look for resources that deal with similar topics to your own site. You may find
many Internet projects that not in direct competition with you, but which share
the same topic as your site. Try to approach the site owners. It is quite possible
that they will be glad to publish information about your project.
One final tip for obtaining inbound links – try to create slight variations in the
inbound link text. If all inbound links to your site have exactly the same link
text and there are many of them, the search engines may flag it as a spam attempt
and penalize your site.
4 Indexing a site
Before a site appears in search results, a search engine must index it. An indexed
site will have been visited and analyzed by a search robot with relevant information
saved in the search engine database. If a page is present in the search engine index,
it can be displayed in search results otherwise, the search engine cannot know anything
about it and it cannot display information from the page..
Most average sized sites (with dozens to hundreds of pages) are usually indexed
correctly by search engines. However, you should remember the following points when
constructing your site. There are two ways to allow a search engine to learn about
a new site:
- Submit the address of the site manually using a form associated with the search
engine, if available. In this case, you are the one who informs the search engine
about the new site and its address goes into the queue for indexing. Only the main
page of the site needs to be added, the search robot will find the rest of pages
by following links.
- Let the search robot find the site on its own. If there is at least one inbound
link to your resource from other indexed resources, the search robot will soon visit
and index your site. In most cases, this method is recommended. Get some inbound
links to your site and just wait until the robot visits it. This may actually be
quicker than manually adding it to the submission queue. Indexing a site typically
takes from a few days to two weeks depending on the search engine. The Google search
engine is the quickest of the bunch.
Try to make your site friendly to search robots by following these rules:
- Try to make any page of your site reachable from the main page in not more than
three mouse clicks. If the structure of the site does not allow you to do this,
create a so-called site map that will allow this rule to be observed.
- Do not make common mistakes. Session identifiers make indexing more difficult.
If you use script navigation, make sure you duplicate these links with regular ones
because search engines cannot read scripts (see more details about these and other
mistakes in section 2.3).
- Remember that search engines index no more than the first 100-200 KB of text on
a page. Hence, the following rule – do not use pages with text larger than 100 KB
if you want them to be indexed completely.
You can manage the behavior of search robots using the file robots.txt. This file
allows you to explicitly permit or forbid them to index particular pages on your
site.
The databases of search engines are constantly being updated; records in them may
change, disappear and reappear. That is why the number of indexed pages on your
site may sometimes vary. One of the most common reasons for a page to disappear
from indexes is server unavailability. This means that the search robot could not
access it at the time it was attempting to index the site. After the server is restarted,
the site should eventually reappear in the index.
You should note that the more inbound links your site has, the more quickly it gets
re-indexed. You can track the process of indexing your site by analyzing server
log files where all visits of search robots are logged. We will give details of
seo software that allows you to track such visits in a later section.
5 Choosing keywords
5.1 Initially choosing keywords
Choosing keywords should be your first step when constructing a site. You should
have the keyword list available to incorporate into your site text before you start
composing it. To define your site keywords, you should use seo services offered
by search engines in the first instance. Sites such as www.wordtracker.com and inventory.overture.com are good starting places for
English language sites. Note that the data they provide may sometimes differ significantly
from what keywords are actually the best for your site. You should also note that
the Google search engine does not give information about frequency of search queries.
After you have defined your approximate list of initial keywords, you can analyze
your competitor’s sites and try to find out what keywords they are using. You may
discover some further relevant keywords that are suitable for your own site.
5.2 Frequent and rare keywords
There are two distinct strategies – optimize for a small number of highly popular
keywords or optimize for a large number of less popular words. In practice, both
strategies are often combined.
The disadvantage of keywords that attract frequent queries is that the competition
rate is high for them. It is often not possible for a new site to get anywhere near
the top of search result listings for these queries.
For keywords associated with rare queries, it is often sufficient just to mention
the necessary word combination on a web page or to perform minimum text optimization.
Under certain circumstances, rare queries can supply quite a large amount of search
traffic.
The aim of most commercial sites is to sell some product or service or to make money
in some way from their visitors. This should be kept in mind during your seo (search
engine optimization) work and keyword selection. If you are optimizing a commercial
site then you should try to attract targeted visitors (those who are ready to pay
for the offered product or service) to your site rather than concentrating on sheer
numbers of visitors.
Example. The query “monitor” is much more popular and competitive than the query
“monitor Samsung 710N” (the exact name of the model). However, the second query
is much more valuable for a seller of monitors. It is also easier to get traffic
from it because its competition rate is low; there are not many other sites owned
by sellers of Samsung 710N monitors. This example highlights another possible difference
between frequent and rare search queries that should be taken into account – rare
search queries may provide you with less visitors overall, but more targeted visitors.
5.3 Evaluating the competition rates of search queries
When you have finalized your keywords list, you should identify the core keywords
for which you will optimize your pages. A suggested technique for this follows.
Rare queries are discarded at once (for the time being). In the previous section,
we described the usefulness of such rare queries but they do not require special
optimization. They are likely to occur naturally in your website text.
As a rule, the competition rate is very high for the most popular phrases. This
is why you need to get a realistic idea of the competitiveness of your site. To
evaluate the competition rate you should estimate a number of parameters for the
first 10 sites displayed in search results:
- The average PageRank of the pages in the search results.
- The average number of links to these sites. Check this using a variety of search
engines.
Additional parameters:
- The number of pages on the Internet that contain the particular search term, the
total number of search results for that search term.
- The number of pages on the Internet that contain exact matches to the keyword
phrase. The search for the phrase is bracketed by quotation marks to obtain this
number.
These additional parameters allow you to indirectly evaluate how difficult it will
be to get your site near the top of the list for this particular phrase. As well
as the parameters described, you can also check the number of sites present in your
search results in the main directories, such as DMOZ and Yahoo.
The analysis of the parameters mentioned above and their comparison with those of
your own site will allow you to predict with reasonable certainty the chances of
getting your site to the top of the list for a particular phrase.
Having evaluated the competition rate for all of your keyword phrases, you can now
select a number of moderately popular key phrases with an acceptable competition
rate, which you can use to promote and optimize your site.
5.4 Refining your keyword phrases
As mentioned above, search engine services often give inaccurate keyword information.
This means that it is unusual to obtain an optimum set of site keywords at your
first attempt. After your site is up and running and you have carried out some initial
promotion, you can obtain additional keyword statistics, which will facilitate some
fine-tuning. For example, you will be able to obtain the search results rating of
your site for particular phrases and you will also have the number of visits to
your site for these phrases.
With this information, you can clearly define the good and bad keyword phrases.
Often there is no need to wait until your site gets near the top of all search engines
for the phrases you are evaluating – one or two search engines are enough.
Example. Suppose your site occupies first place in the Yahoo search engine for a
particular phrase. At the same time, this site is not yet listed in MSN, or Google
search results for this phrase. However, if you know the percentage of visits to
your site from various search engines (for instance, Google – 70%, Yahoo – 20%,
MSN search – 10%), you can predict the approximate amount of traffic for this phrase
from these other searches engines and decide whether it is suitable.
As well as detecting bad phrases, you may find some new good ones. For example,
you may see that a keyword phrase you did not optimize your site for brings useful
traffic despite the fact that your site is on the second or third page in search
results for this phrase.
Using these methods, you will arrive at a new refined set of keyword phrases. You
should now start reconstructing your site: Change the text to include more of the
good phrases, create new pages for new phrases, etc.
You can repeat this seo exercise several times and, after a while, you will have
an optimum set of key phrases for your site and considerably increased search traffic.
Here are some more tips. According to statistics, the main page takes up to 30%-50%
of all search traffic. It has the highest visibility in search engines and it has
the largest number of inbound links. That is why you should optimize the main page
of your site to match the most popular and competitive queries. Each site page should
be optimized for one or two main word combinations and, possibly for a number of
rare queries. This will increase the chances for the page get to the top of search
engine lists for particular phrases.
6 Miscellaneous information on search engines
6.1 Google SandBox
At the beginning of 2004, a new and mysterious term appeared among seo specialists
– Google SandBox. This is the name of a new Google spam filter that excludes
new sites from search results. The work of the SandBox filter results in new sites
being absent from search results for virtually any phrase. This even happens with
sites that have high-quality unique content and which are promoted using legitimate
techniques.
The SandBox is currently applied only to the English segment of the Internet; sites
in other languages are not yet affected by this filter. However, this filter may
expand its influence. It is assumed that the aim of the SandBox filter is to exclude
spam sites – indeed, no search spammer will be able to wait for months until he
gets the necessary results. However, many perfectly valid new sites suffer the consequences.
So far, there is no precise information as to what the SandBox filter actually is.
Here are some assumptions based on practical seo experience:
- SandBox is a filter that is applied to new sites. A new site is put in the sandbox
and is kept there for some time until the search engine starts treating it as a
normal site.
- SandBox is a filter applied to new inbound links to new sites. There is a fundamental
difference between this and the previous assumption: the filter is not based on
the age of the site, but on the age of inbound links to the site. In other words,
Google treats the site normally but it refuses to acknowledge any inbound links
to it unless they have existed for several months. Since such inbound links are
one of the main ranking factors, ignoring inbound links is equivalent to the site
being absent from search results. It is difficult to say which of these assumptions
is true, it is quite possible that they are both true.
- The site may be kept in the sandbox from 3 months to a year or more. It has also
been noticed that sites are released from the sandbox in batches. This means that
the time sites are kept in the sandbox is not calculated individually for each site,
but for groups of sites. All sites created within a certain time period are put
into the same group and they are eventually all released at the same time. Thus,
individual sites in a group can spend different times in the sandbox depending where
they were in the group capture-release cycle.
Typical indications that your site is in the sandbox include:
- Your site is normally indexed by Google and the search robot regularly visits
it.
- Your site has a PageRank; the search engine knows about and correctly displays
inbound links to your site.
- A search by site address (www.site.com) displays correct results, with the correct
title, snippet (resource description), etc.
- Your site is found by rare and unique word combinations present in the text of
its pages.
- Your site is not displayed in the first thousand results for any other queries,
even for those for which it was initially created. Sometimes, there are exceptions
and the site appears among 500-600 positions for some queries. This does not change
the sandbox situation, of course.
There no practical ways to bypass the Sandbox filter. There have been some suggestions
about how it may be done, but they are no more than suggestions and are of little
use to a regular webmaster. The best course of action is to continue seo work on
the site content and structure and wait patiently until the sandbox is disabled
after which you can expect a dramatic increase in ratings, up to 400-500 positions.
6.2 Google LocalRank
On February 25, 2003, the Google Company patented a new algorithm for ranking
pages called LocalRank. It is based on the idea that pages should be ranked
not by their global link citations, but by how they are cited among pages that deal
with topics related to the particular query. The LocalRank algorithm is not used
in practice (at least, not in the form it is described in the patent). However,
the patent contains several interesting innovations we think any seo specialist
should know about. Nearly all search engines already take into account the topics
to which referring pages are devoted. It seems that rather different algorithms
are used for the LocalRank algorithm and studying the patent will allow us to learn
general ideas about how it may be implemented.
While reading this section, please bear in mind that it contains theoretical information
rather than practical guidelines.
The following three items comprise the main idea of the LocalRank algorithm:
1. An algorithm is used to select a certain number of documents relevant to the
search query (let it be N). These documents are initially sorted by some
criteria (this may be PageRank, relevance or a group of other criteria). Let us
call the numeric value of this criterion OldScore.
2. Each of the N N selected pages goes through a new ranking procedure and
it gets a new rank. Let us call it LocalScore.
3. The OldScore and LocalScore values for each page
are multiplied, to yield a new value – NewScore. The pages are finally
ranked based on NewScore.
The key procedure in this algorithm is the new ranking procedure, which gives each
page a new LocalScore rank. Let us examine this new procedure in more detail:
0. An initial ranking algorithm is used to select N pages relevant to the
search query. Each of the N pages is allocated an OldScore value by
this algorithm. The new ranking algorithm only needs to work on these N selected
pages. .
1. While calculating LocalScore for each page, the system selects those pages
from N that have inbound links to this page. Let this number be M.
At the same time, any other pages from the same host (as determined by IP address)
and pages that are mirrors of the given page will be excluded from M.
2. The set M is divided into subsets Li. These subsets contain pages
grouped according to the following criteria:
- Belonging to one (or similar) hosts. Thus, pages whose first three octets in their
IP addresses are the same will get into one group. This means that pages whose IP
addresses belong to the range xxx.xxx.xxx.0 to xxx.xxx.xxx.255 will
be considered as belonging to one group.
- Pages that have the same or similar content (mirrors)
- Pages on the same site (domain).
3. Each page in each Li subset has rank OldScore. One page with the
largest OldScore rank is taken from each subset, the rest of pages are excluded
from the analysis. Thus, we get some subset of pages K referring to this
page.
4. Pages in the subset K are sorted by the OldScore parameter, then
only the first k pages (k is some predefined number) are left in the
subset K. The rest of the pages are excluded from the analysis.
5. LocalScore is calculated in this step. The OldScore parameters
are combined together for the rest of k pages.
Here m is some predefined parameter that may vary from one to three. Unfortunately,
the patent for the algorithm in question does not describe this parameter in detail.
After LocalScore is calculated for each page from the set N, NewScore
values are calculated and pages are re-sorted according to the new criteria. The
following formula is used to calculate NewScore:
NewScore(i)= (a+LocalScore(i)/MaxLS)*(b+OldScore(i)/MaxOS)
i is the page for which the new rank is calculated.
a and b – are numeric constants (there is no more detailed information
in the patent about these parameters).
MaxLS – is the maximum LocalScore among those calculated.
MaxOS – is the maximum value among OldScore values.
Now let us put the math aside and explain these steps in plain words.
In step 0) pages relevant to the query are selected. Algorithms that do not take
into account the link text are used for this. For example, relevance and overall
link popularity are used. We now have a set of OldScore values. OldScore
is the rating of each page based on relevance, overall link popularity and other
factors.
In step 1) pages with inbound links to the page of interest are selected from the
group obtained in step 0). The group is whittled down by removing mirror and other
sites in steps 2), 3) and 4) so that we are left with a set of genuinely unique
sites that all share a common theme with the page that is under analysis. By analyzing
inbound links from pages in this group (ignoring all other pages on the Internet),
we get the local (thematic) link popularity.
LocalScore values are then calculated in step 5). LocalScore is the rating
of a page among the set of pages that are related by topic. Finally, pages are rated
and ranked using a combination of LocalScore and OldScore.
6.3 Seo tips, assumptions, observations
This section provides information based on an analysis of various seo articles,
communication between optimization specialists, practical experience and so on.
It is a collection of interesting and useful tips ideas and suppositions. Do not
regard this section as written in stone, but rather as a collection of information
and suggestions for your consideration.
- Outbound links. Publish links to authoritative resources in your subject field
using the necessary keywords. Search engines place a high value on links to other
resources based on the same topic.
- Outbound links. Do not publish links to FFA sites and other sites excluded from
the indexes of search engines. Doing so may lower the rating of your own site.
- Outbound links. A page should not contain more than 50-100 outbound links. More
links will not harm your site rating but links beyond that number will not be recognized
by search engines.
- Inbound site-wide links. These are links published on every page of the site.
It is believed that search engines do not approve of such links and do not consider
them while ranking pages. Another opinion is that this is true only for large sites
with thousands of pages.
- The ideal keyword density is a frequent seo discussion topic. The real answer
is that there is no ideal keyword density. It is different for each query and search
engines calculate it dynamically for each search query. Our advice is to analyze
the first few sites in search results for a particular query. This will allow you
to evaluate the approximate optimum density for specific queries.
- Site age. Search engines prefer old sites because they are more stable.
- Site updates. Search engines prefer sites that are constantly developing. Developing
sites are those in which new information and new pages periodically appear.
- Domain zone. Search engines prefer sites that are located in the zones .edu, .mil,
.gov, etc. Only the corresponding organizations can register such domains so these
domains are more trustworthy.
- Search engines track the percent of visitors that immediately return to searching
after they visit a site via a search result link. A large number of immediate returns
means that the content is probably not related to the corresponding topic and the
ranking of such a page gets lower.
- Search engines track how often a link is selected in search results. If some link
is only occasionally selected, it means that the page is of little interest and
the rating of such a page gets lower
- Use synonyms and derived word forms of keywords, search engines will appreciate
that (keyword stemming).
; - Search engines consider a very rapid increase in inbound links as artificial
promotion and this results in lowering of the rating. This is a controversial topic
because this method could be used to lower the rating of one's competitors.
- Google does not take into account inbound links if they are on the same (or similar)
hosts. This is detected using host IP addresses. Pages whose IP addresses are within
the range of xxx.xxx.xxx.0 to xxx.xxx.xxx.255. are regarded as being
on the same host. This opinion is most likely to be rooted in the fact that Google
have expressed this idea in their patents. However, Google employees claim that
no limitations of IP addresses are imposed on inbound links and there are no reasons
not to believe them.
- Search engines check information about the owners of domains. Inbound links originating
from a variety of sites all belonging to one owner are regarded as less important
than normal links. This information is presented in a patent.
- Search engines prefer sites with longer term domain registrations.
6.4 Creating correct content
The content of a site plays an important role in site promotion for many reasons.
We will describe some of them in this section. We will also give you some advice
on how to populate your site with good content.
- Content uniqueness. Search engines value new information that has not been published
before. That is why you should compose own site text and not plagiarize excessively.
A site based on materials taken from other sites is much less likely to get to the
top in search engines. As a rule, original source material is always higher in search
results.
- While creating a site, remember that it is primarily created for human visitors,
not search engines. Getting visitors to visit your site is only the first step and
it is the easiest one. The truly difficult task is to make them stay on the site
and convert them into purchasers. You can only do this by using good content that
is interesting to real people.
- Try to update information on the site and add new pages on a regular basis. Search
engines value sites that are constantly developing. Also, the more useful text your
site contains, the more visitors it attracts. Write articles on the topic of your
site, publish visitors' opinions, create a forum for discussing your project. A
forum is only useful if the number of visitors is sufficient for it to be active.
Interesting and attractive content guarantees that the site will attract interested
visitors.
- A site created for people rather than search engines has a better chance of getting
into important directories such as DMOZ and others.
- An interesting site on a particular topic has much better chances to get links,
comments, reviews, etc. from other sites on this topic. Such reviews can give you
a good flow of visitors while inbound links from such resources will be highly valued
by search engines.
- As final tip…there is an old German proverb: "A shoemaker sticks to his last"
which means, "Do what you can do best.” If you can write breathtaking and creative
textual prose for your website then that is great. However, most of us have no special
talent for writing attractive text and we should rely on professionals such as journalists
and technical writers. Of course, this is an extra expense, but it is justified
in the long term.
6.5 Selecting a domain and hosting
Currently, anyone can create a page on the Internet without incurring any expense.
Also, there are companies providing free hosting services that will publish your
page in return for their entitlement to display advertising on it. Many Internet
service providers will also allow you to publish your page on their servers if you
are their client. However, all these variations have serious drawbacks that you
should seriously consider if you are creating a commercial project.
First, and most importantly, you should obtain your own domain for the following
reasons:
- A project that does not have its own domain is regarded as a transient project.
Indeed, why should we trust a resource if its owners are not even prepared to invest
in the tiny sum required to create some sort of minimum corporate image? It is possible
to publish free materials using resources based on free or ISP-based hosting, but
any attempt to create a commercial project without your own domain is doomed to
failure.
- Your own domain allows you to choose your hosting provider. If necessary, you
can move your site to another hosting provider at any time.
Here are some useful tips for choosing a domain name.
- Try to make it easy to remember and make sure there is only one way to pronounce
and spell it.
- Domains with the extension .com are the best choice to promote international projects
in English. Domains from the zones .net, .org, .biz, etc., are available but less
preferable.
- If you want to promote a site with a national flavor, use a domain from the corresponding
national zone. Use .de – for German sites, .it – for Italian sites, etc.
- In the case of sites containing two or more languages, you should assign a separate
domain to each language. National search engines are more likely to appreciate such
an approach than subsections for various languages located on one site.
A domain costs $10-20 a year, depending on the particular registration service and
zone.
You should take the following factors into consideration when choosing a hosting
provider:
- Access bandwidth.
- Server uptime.
- The cost of traffic per gigabyte and the amount of prepaid traffic.
- The site is best located in the same geographical region as most of your expected
visitors.
The cost of hosting services for small projects is around $5-10 per month.
Avoid “free” offers while choosing a domain and a hosting provider. Hosting providers
sometimes offer free domains to their clients. Such domains are often registered
not to you, but to the hosting company. The hosting provider will be the owner of
the domain. This means that you will not be able to change the hosting service of
your project, or you could even be forced to buy out your own domain at a premium
price. Also, you should not register your domains via your hosting company. This
may make moving your site to another hosting company more difficult even though
you are the owner of your domain.
6.6 Changing the site address
You may need to change the address of your project. Maybe the resource was started
on a free hosting service and has developed into a more commercial project that
should have its own domain. Or maybe the owner has simply found a better name for
the project. In any case, moving to a new address can be problematic and it is a
difficult and unpleasant task to move a project to a new address. For starters,
you will have to start promoting the new address almost from scratch. However, if
the move is inevitable, you may as well make the change as useful as possible.
Our advice is to create your new site at the new location with new and unique content.
Place highly visible links to the new resource on the old site to allow visitors
to easily navigate to your new site. Do not completely delete the old site and its
contents.
This approach will allow you to get visitors from search engines to both the old
site and the new one. At the same time, you get an opportunity to cover additional
topics and keywords, which may be more difficult within one resource.
|
|
|