The Wrong Approach
The Little Things
AJAX allows you to incorporate a lot of
innovative site design characteristics. Some
designers take the AJAX craze too far,
however, by incorporating AJAX to a degree
that it hurts their site's usability and
accessibility. Here are a few of the most
common problems.
Making it too simple. Designing sites where
AJAX controls everything and serves content
on a single page can be a search engine
ranking disaster, as your website will have
only one URL for everything. Instead, be
sure to offer unique sub-links and URLs for
popular site features.
Disabling browser controls. Since AJAX does
not communicate with your browser's history,
simple actions like hitting your browsers
back and forward buttons are rendered
useless. Although you may not traverse your
website via browser buttons, many users do.
Make sure you're not overdoing AJAX so much
that users get lost in your website.
Not using Google's Webmaster Tools. These
tools are a simply and reliable way to keep
track of the pages of your website Google is
indexing.
Cloaking
If Google cannot read the JavaScript
components of your site, an obvious solution
is to just provide an alternative readable
version for Google, right? Wrong. Engaging
in cloaking is a bad approach for making
your AJAX website search engine friendly,
because it is likely to get your site
blacklisted and removed from Google's index
entirely. Cloaking occurs when a developer
creates two distinct versions of the same
website, with the second version (usually
plain HTML/text) only visible to search
engine spiders. Spammers have long used the
technique to hide popular phrases and links
in the invisible content in order to
artifically rank better in search engines,
and Google has responded by banning these
sites. To make sure Google doesn't confuse
your site for one that's up to no good, make
sure your crawlable site is derived from the
same site your visitors are seeing...."
"Are PHP
Session ID’s A Cause For Duplicate Content
With Google?
June 22nd, 2007
I know certain web application depend on
Session ID’s to handle unique user
experience.
You know you’ve caught a case of Session
ID’s when you’re browsing a site and your
URL’s have nice random characters and number
appended to it. Basically they are a real
eyesore.
But more importantly, from what I understand
Session IDs can create duplicate content
issues for your website. You no longer have
one page with one URL, but you can have
thousands of unique URL pointing to one
single page.
Google might crawl your site one day and
pick up all your links with one session ID,
and the next time they crawl they pick up a
whole set of new links pointing to the same
pages because the session ID has changed.
That does suck!
Disabling PHP Session ID’s is not that
complicated and there are a verity of tricks
that can prevent search engines from picking
them up. You can be simple and flip a global
switch and turn off Session ID’s all
together, or target the bots directly.
A few years ago I launched a site running OS
Commerce and I had Session IDs enabled.
Google did its thing and a few weeks later
my results were a mess.
Now word on the street is that Google can
handle Session ID’s much better then a few
years ago. So is it worth ones time to even
think about Session IDs’? I mean, with all
those big brains working at the G-Factory
you would think they could decipher Session
ID’s.
Now I lean towards being better safe than
sorry and I turn off my Session ID’s.
No Cookie, no Washy!"
Building Dynamic Pages With
Search Engines in Mind
Tim Perdue
"Almost any developer knows that
search engine placement is critical to
the success of a web site. What many
people don't know is that a lot of
search engines cannot index many
database-driven pages (basically any
page with a '?' or '&' in the URL)..."
5
Search Engine Mistakes Not to
Make
If you want to improve your
search engine rankings, first
fix these critical errors that
can make your site invisible on
the internet.
By Bill Treloar
People searching for your
products or services on the
Internet can be an important
source of new customers for you.
Because someone searching for
what you sell is already "sold"? they're
looking to buy. Where else can
you find that kind of qualified
sales lead?
Since most people give up on a
search if they don't find what
they're looking for in the first
three pages of the search engine
results, your web site needs to
get ranked in those top three
pages and the higher, the
better.
But
there are five common
characteristics that can
relegate even the most
attractive and compelling site
to the search engine
hinterlands. Many nice-looking
sites show up on page 72 of the
search engine results instead of
on page 1 or 2 because they make
one or more of the following
five critical mistakes.
1. Insufficient
content.
Your web site needs to have at
least 200 words of keyword-rich
text per page. Search engines
determine what your web page is
about based on the words you use
on the page. A page that's
mostly product photos may be
very meaningful to someone
shopping for those items. But
the search engines have no way
of understanding what's in those
pictures, they need text content
to do their jobs.
Your
text needs to use the keywords
that people will search for. If
you're an exterminator and your
site talks at length about
"exterminators", "pest
exterminators", "insect
extermination", and "rodent
infestation," the search engines
will understand that your site
is about those terms. But if
someone searches for "pest
control," your site won't show
up unless you use that phrase on
your site, too.
2. Use of frames.
Creating frames is a technique
that webmasters use to simplify
their work and to help ensure a
consistent appearance across all
the pages of a web site. For
example, your site designer may
have created an outside "frame"
for your page that has a top
border with site identification,
logos and so on. It may also
have a left side border with
links to the various pages on
your site. And it may have a
bottom border with contact
information, a copyright
statement and links to things
like a privacy statement. In
frames, the "meat" of the pages,
where the real content is, is
the area enclosed by those
borders, and that's the only
part that changes as you go from
page to page.
Unfortunately, search engines
may have difficulty moving
around in a framed site and may
fail to add all of your pages to
their listings. And pages that
are missed will never show up in
the search engine results when
people search for your keywords.
A
more important problem occurs
when the content pages do show
up in the search engine results'
pages. That's because when a
searcher clicks on the link in
the search engine results, it
brings them to the content part
of the page. Just the content
part, which doesn't include the
outside frame where site
identification appears and where
the links are that visitors need
to find your contact information
or the page where they can place
an order. The simplest solution?
Simply avoid using frames.
3. Graphics that
include text.
Because different visitors to
your site have different fonts
installed on their computers,
the only way to ensure that the
text on your web pages looks
exactly as you want it, the size,
font, line breaks and so on, is
to include it in a graphic. And
often such text looks really
great.
Unfortunately, search engines
can't tell if that graphic says
"REALLY Cheap Widgets" or if
it's a photo of your new puppy.
Words in graphics are wasted on
the search engines. In order to
understand that your page is
about "really cheap widgets,"
they need to find those words in
plain text on your page.
In a
similar fashion, navigation
buttons that include words also
can't be read by search engines.
So what should you do? Include
keywords in the links to pages
on your site. This will help the
search engines understand that
those pages are relevant to
those words. So either replace
your navigation buttons with
plain text links to the pages on
your site, or supplement them
with a redundant set of plain
text links somewhere else on
your page.
4.
Dynamic content.
Dynamic web pages are most often
found on e-commerce sites that
have numerous pages featuring
hundreds of products. (Dynamic
pages are constructed "on the
fly" from a database of product
information and can often be
identified by the presence of a
"?" somewhere in the page
address.
Regrettably, dynamic pages are
often ignored by search engines
for a number of technical
reasons. One way to fix this
problem is to create topical
pages that aren't dynamic. For
example, you may sell many
varieties of both tabletop
widgets and portable widgets. By
creating a static page (a
"normal" web page that's not
created by your database) for
tabletop widgets and another for
portable widgets, you can use
your essential keywords on those
pages and still link to your
dynamic pages to display
individual products. Your
dynamic pages are unlikely to be
seen by the search engines, but
your static, topical pages
describing your selection of
tabletop and portable widgets
should.
5. Insufficient
link popularity.
Almost all the major search
engines factor into their
rankings some measure of the
number and quality of other
sites that link to yours. That's
a reflection of their belief
that good web sites don't link
to other web sites that are
worthless.
If
lots of high quality sites link
to your site, the chances are
that you have a better site than
one without any incoming links.
Of course, you might be
comparing your well-established
site to a brand new site no one
knows about yet, but over time,
it seems to work out that better
sites have more incoming links.
And all other things being
equal, a site with a lot of
incoming links will be ranked
higher by the search engines
than a site with fewer incoming
links. And a site with no
incoming links may be dropped
entirely from some search
engines.
Try
to obtain links from web sites
that complement yours but that
don't compete with you.
Investigate directories that
list sites in your line of
business. And be prepared to
offer to link back to those
sites in return for a link from
them to you.
If
you can refrain from making
these 5 critical mistakes, you
can avoid earning an abysmal
search engine ranking. Being
visible on the web is the first
step to being found on the web.
And while you may still need
search engine optimization to
obtain rankings in the top three
pages of searches on your
important keywords, you first
need to make sure you're not
condemned to page 72 by these
five critical errors."
Wham Creative Group
http://www.whamcreative.com/content/5-search-engine-mistakes-not-to-make.html
RE: [PHP]
search engines ignoring .php3 pages
I think this refers to fact that (all) search engines will not index your page
if it contains any "?" characters in the URL. These are usually dynamic contents and yes PHP3 *can* be dynamic, although it doesn't have to be serving any dynamic content. I *personally* don't think the search engine would focused on the file extension ie php3, asp, jsp etc *as long as* they
don't have "?" in the URL the search spider will index your page.
There are way around this problem of course (ie fool the search engine to think that your dynamic pages are static), just do a search on mailing list archive for "search engines crawling" or
something similar and you should get some idea on how to do it.
-----Original Message-----
>From: Charles Killian [mailto:Charles@xxxxxxxxxxxx]
>Sent: Thursday, October 05, 2000 8:28 AM
> To: php-general@xxxxxxxxxxxxx
> Subject: [PHP] search engines ignoring .php3 pages
>
>
> I know there has been a lot of talk about search engines and php pages but
> what is the latest consensus on php pages being ignored by search engines?
>
> Has anyone found this to be true:
>
> "Though each engine categorizes a site's content according its own unique
> algorithms, all
> search engines share the same flaw. They cannot handle dynamically
> loading pages, such as the ".php3" front door to your site, and will
> ignore links placed on such default pages."
>
> This is what a search engine placement company is claiming.
>
> Charles"
From: Zend the PHP Company
http://www.zend.com/lists/php-general/200010/msg00497.html
See Your Site With the Eyes of a Spider
Making efforts to optimize a site is great but what counts is how search engines see your efforts. While even the most careful optimization does not guarantee tops position in search results, if your site does not follow basic SEO truths, then it is more than certain that this site will not score well with search engines. One way to check in advance how your SEO efforts are seen by search engines is to use a search engine simulator.
Spiders Explained
Basically all search engine spiders function on the same principle – they crawl the Web and index pages, which are stored in a database and later use various algorithms to determine page ranking, relevancy, etc of the collected pages. While the algorithms of calculating ranking and relevancy widely differ among search engines, the way they index sites is more or less uniform and it is very important that you know what spiders are interested in and what they neglect.
Search engine spiders are robots and they do not read your pages the way a human does. Instead, they tend to see only particular stuff and are blind for many extras (Flash, JavaScript) that are intended for humans. Since spiders determine if humans will find your site, it is worth to consider what spiders like and what don't.
Flash, JavaScript, Image Text or Frames?!
Flash, JavaScript and image text are NOT visible to search engines. Frames are a real disaster in terms of SEO ranking. All of them might be great in terms of design and usability but for search engines they are absolutely wrong. An incredible mistake one can make is to have a Flash intro page (frames or no frames, this will hardly make the situation worse) with the keywords buried in the animation. Check with the Search Engine Spider Simulator tool a page with Flash and images (and preferably no text or inbound or outbound hyperlinks) and you will see that to search engines this page appears almost blank.
Running your site through this simulator will show you more than the fact that Flash and JavaScript are not SEO favorites. In a way, spiders are like text browsers and they don't see anything that is not a piece of text. So having an image with text in it means nothing to a spider and it will ignore it. A workaround (recommended as a SEO best practice) is to include meaningful description of the image in the ALT attribute of the tag but be careful not to use too many keywords in it because you risk penalties for keyword stuffing. ALT attribute is especially essential, when you use links rather than text for links. You can use ALT text for describing what a Flash movie is about but again, be careful not to trespass the line between optimization and over-optimization.
Are Your Hyperlinks Spiderable?
The search engine spider simulator can be of great help when trying to figure out if the hyperlinks lead to the right place. For instance, link exchange websites often put fake links to your site with _javascript (using mouse over events and stuff to make the link look genuine) but actually this is not a link that search engines will see and follow. Since the spider simulator would not display such links, you'll know that something with the link is wrong.
It is highly recommended to use
the <noscript> tag, as opposed to _javascript
based menus. The reason is that _javascript
based menus are not spiderable and all the
links in them will be ignored as page text.
The solution to this problem is to put all
menu item links in the <noscript> tag. The <noscript>
tag can hold a lot but please avoid using it
for link stuffing or any other kind of SEO
manipulation.
If you happen to have tons of hyperlinks on
your pages (although it is highly
recommended to have less than 100 hyperlinks
on a page), then you might have hard times
checking if they are OK. For instance, if
you have pages that display “403 Forbidden”,
“404 Page Not Found” or similar errors that
prevent the spider from accessing the page,
then it is certain that this page will not
be indexed. It is necessary to mention that
a spider simulator does not deal with 403
and 404 errors because it is checking where
links lead to not if the target of the link
is in place, so you need to use other tools
for checking if the targets of hyperlinks
are the intended ones.
Looking for Your Keywords
While there are specific tools, like the
Keyword Playground or the Website Keyword
Suggestions, which deal with keywords in
more detail, search engine spider simulators
also help to see with the eyes of a spider
where keywords are located among the text of
the page. Why is this important? Because
keywords in the first paragraphs of a page
weigh more than keywords in the middle or at
the end. And if keywords visually appear to
us to be on the top, this may not be the way
spiders see them. Consider a standard Web
page with tables. In this case
chronologically the code that describes the
page layout (like navigation links or
separate cells with text that are the same
sitewise) might come first and what is
worse, can be so long that the actual
page-specific content will be screens away
from the top of the page. When we look at
the page in a browser, to us everything is
fine – the page-specific content is on top
but since in the HTML code this is just the
opposite, the page will not be noticed as
keyword-rich.
Are Dynamic Pages Too Dynamic to be Seen At
All
Dynamic pages (especially ones with question
marks in the URL) are also an extra that
spiders do not love, although many search
engines do index dynamic pages as well.
Running the spider simulator will give you
an idea how well your dynamic pages are
accepted by search engines. Useful
suggestions how to deal with search engines
and dynamic URLs can be found in the Dynamic
URLs vs. Static URLs article.
Meta Keywords and Meta Description
Meta keywords and meta description, as the
name implies, are to be found in the <META>
tag of a HTML page. Once meta keywords and
meta descriptions were the single most
important criterion for determining
relevance of a page but now search engines
employ alternative mechanisms for
determining relevancy, so you can safely
skip listing keywords and description in
Meta tags (unless you want to add there
instructions for the spider what to index
and what not but apart from that meta tags
are not very useful anymore)