| |
Rpsoft Site-Crawler Links
SITE CRAWLER
LOADING, USAGE, ADJUSTMENT, AND TROUBLESHOOTING
LOADING
There is a chance during load that the computer that is
loading rpsoft 2000 site-crawler may need windows updates before the
site-crawler load can be completed successfully. First of all
ensure that the computer that you are loading it into is either Windows
2000 or Windows XP Operating System. If you have problems during
the load - particularly stopping in the middle, and even more
particularly at the loading of a system file marked "wininet.dll", then
your computer may need updating prior to the load. In this case,
the most important update is the latest Microsoft Internet Explorer -
which can be downloaded from the Net free from Microsoft. The new
Internet Explorer uses a later version of "wininit.dll" just as rpsoft
2000 site-crawler does. Your Windows 2000 computer may not allow
the wininit.dll from the rpsoft 2000 Microsoft loader to update your
system files with the new wininit.dll unless you are also using the new
Microsoft Internet Explorer. Once you have updated your Windows
Internet Explorer, the loading of rpsoft 2000 site-crawler should now
work.
PREVENTION
Rpsoft 2000 site crawler is a
“scanning” type bot. Therefore it scans looking for key phrases. It
does not, however, interpret the code, so there is a chance that if a
site is done in complex code for internal links, that rpsoft 2000 site
crawler may not be able to follow it. Links must be done using the
reference “href=” or “src=” before the link. Most webmasters,
fortunately keep it simple.
Of course, there is the thought that if a fellow
webmaster (or webmistress) is using complex code for links that they are
taking a chance on the site being “botable” by the search engines bots.
If the search engine bots cannot follow the links, then your link would
not be seen anyway. It is also unfortunately noticeable that some sites
use very complex code for reciprocal links, but yet keep their favorite
affiliate sponsors in simple html code as well as their own important
links simple. You can draw your own conclusion on that one. We have.
Suggestions:
- Partner with sites for links where this bot can in fact find their
link. While this is a simple software bot, it is impossible to
know all of the strengths of the software bots of the search engines.
If this bot cannot find the link, perhaps neither will the search
engines.
- If you have already linked to another and this bot cannot find the
link, remember to check visually for the link before taking action and
complaining to your fellow webmaster or webmistress. There is a
chance that the link could be there and link code complexity is
preventing this bot from finding it.
- If checking a site visually for links, note how obvious or
non-obvious it is to find the site link page. Webmasters who
hide their link page also may not make good partners - since human
beings may also have great trouble in seeing your advertising.
- It is of course best to partner with sites where the link pages
are in a upper directory of their site. Sites that bury links
multiple directories down are telling you that their link partners are
not very important to them
TROUBLESHOOTING - POSSIBLE FIXABLE ISSUES
- Scan Speed - This site crawler will work faster of course on a
high speed internet. In either case, it should be faster per
page than a browser since it does not interpret coding nor load
pictures. Still more speed in finding links can be had by going
under "options" and modifying the program for "site crawl" and
"search" options. One can select web page phrases to give
preference to, and what documents ( of a few) to scan or not scan, and
how many pages each site to scan. Note that in "options" under
"search" that if you check the bottom item "eliminate duplicate html
spaces before scanning" that you might be able to now scan for
multiple words with less chance of error. However, that option
requires a third pass of each web page and will also slow down the
search. Check that option only if you really need it and only
for the times it is needed.
- Link Page Blocked - If this bot cannot find a link, it might not
be because of your link itself, it might be complex coding on the
pages leading to the link page itself. If this is the case, and
you still wish to partner with this site for links, consider using the
web page address itself where the link is and not the overall site
address, such as: "http://www.thissite/links/software.htm".
In this case, the bot will scan this page first. If the site
owner does not move the link, then this reference should work - as
long as the reciprocal link of course is done simply on the page
itself.
- Too many pages - We have seen some sites with thousands of pages.
Even with a fast scan checker, checking the whole site can take a
while. You might be able to optimize the options more to get to
the right page faster under "options" in the pull down menu - mostly
in section "site crawl" of the options. For example, if you are
in the software business and sites like this tend to put your link on
a web page marked "software" then software should be one of your
priority words. If that does not help since there still are too
many pages to scan, consider that after finding the page to search
that particular page first in the future - much as the example right
above this in "2".
- Wrong Directory - The bot does not work well in a directory of a
site rather than the main site anyway. However, some sites may
tell you that your link is in a certain directory of a site, and it
might not be. It might be in a directory higher. In these
cases, you might need to use the main site to scan rather than the
directory that they told you.
-
URL search issues - One of the most confusing
search problems is if your url uses a space in it. Html coding does
not expect that and it is likely to put a “%20” where it sees the space
within the html coding. While this bot has been somewhat
designed to handle at least some spaces in URLs, it is best to simply
never use spaces within urls.
-
Search Problems
-Searching for
single words is best (or a url) – and even then one must be careful.
Recall that the search is of the html and there are some differences
in html coding than in the words that you visibly see on the web
page. Sometimes html composers will put in extra blanks in the html
that do not show up on web pages. Also some characters on a web page
such as quotation marks, greater than, and less than symbols, and also
extra blanks (more than one between words) are coded rather than the
normal characters. If you need to try and look for a phrase, go to
“options” in the pull down menu and under “search” check the box for
“eliminate duplicate html blanks”. This will at least eliminate one
of the problems, although it will slow down the search more since it
adds an added page scan. You should also ensure of course that the
phrase you are searching for does not contain quotation marks, more
than one space between words, greater than or less than symbols, or
other coded items.
- Drop Down Boxes - A few sites, thankfully rare, may used drop down
boxes for link page access. The coding of this may be such that
the coding is not active till user intervention, hence the bot will
not get past this section to get to the link pages. Note also
that this technique is very poor for advertising your site also.
Since links are a form of advertising, hiding link page names in a
drop down box will likely stop many potential customers from seeing
your link - since an added action is required of them. However,
if your link is in fact on the site and you have found the page, and
you still wish to link to this other site, then you could take the
same action as in "2" above and copy and use the specific page that
your link is found on
- Jams While Following Links – (a) Jams can occur with interruption
in internet service or problems with it. Please save long jobs at
interim times to help avoid losing data (b) There may be a delay when
clicking on site-crawler after it has been running alone for a long
time - this delay may also be caused by waiting for an internet
response (c) one site had asked for a consumer response on a web page
before continuing. Of course, a bot cannot do that. We added the
item “download” within “options” as a page to not load since that was
the type of page that had caused that particular stoppage.
- Site Skip - If
using multi-site searching, and you are at a site, and you believe it
has already scanned the links pages and yet the site is still quite
long, you can opt to click the button “skip this site” if in multi
site mode.
- Re-Directs – if the link in fact
does seem there visually but not by bot, look to see if the main url
has changed. We find that some webmasters use re-directs from the site
name they give you to another site. Now, all may still be well here if
you are happy with the link exchange at the new url. If so, use that
url to scan the link for and not the one that they gave you. Again,
this bot is told to not leave the main url area it is given.
-
“Cannot Load URL” – This indicator means that the site has a valid
address but at least temporarily cannot be reached. Best to try again
later. While some sites do go off of the air permanently, we have seen temporary
problems in reaching even good sites.
- "cgi" files as well as
"jpg", "exe" and other files - site-crawler will not load web pages
that it believes are binary such as jpg files, exe files, gif files, xls files and the like. It cannot scan binary files. It
also will not scan files or directories marked "cgi" since early
testing showing many of those files as having binary content also.
If a link partner does store your link in a directory such as:
http://www.main.com/cgi-bin/links/yourbusiness.htm site-crawler will
not find that link. What you can do is the same option as mentioned
above - which is ask what page the link is on and enter that single
page in your list to be scanned first. If your link stays on the
page, site-crawler will then find it.
- offsite storage - We
find that perhaps 1% of the sites that we have worked with store their
links off-site, in a site other than their own. They sometimes
store them at an automatic link site. This could present several
problems to their link partners. The first is that bots (such as
site-crawler) are programmed to not go off site, and therefore will
miss the links if they are programmed to look for links at the main
site. The second is that since the links likely will be
separated from the real site content, the search engines will not
likely ever give the link page a good rating, and the link may in fact
therefore may never help your own site standing. If in spite of
those reasons, you still wish to link to that site(s), then the way to
do it is to search the offsite area instead of the main site itself.
For example, suppose you wish to link to http://www.thatsite.com and
instead that particular site stores its links at
http://www.paidlinks.com - likely in a site directory such as:
http://www.paidlinks.com/thatsite/. Then you might be able to
use site-crawler to check http://www.paidlinks.com/thatsite/ for
the reciprocal links. We found that that worked in the 6 cases
we had.
- abbreviated links -
Having trouble looking for a reciprocal link back to your site on
another site while using a search word to search for your link such
as http://www.yoursite.com/ or http://www.yoursite.com/index.htm ?
Note that others may have abbreviated the link back to your site
simply as: http://www.yoursite.com or even www.yoursite.com ?
Perhaps it is best to link for the simplest word that they could use,
such as even "yoursite.com".
TROUBLESHOOTING - NON FIXABLE ISSUES
-
External Link Storage – Unless you
use the new site address where the links are located, site-crawler
will not find the links. Site-crawler is programmed to not go off
site. See the discussion above in the last section, #13.
- Problems at link itself - If there is a coding problem at the link
itself to you on a page such as a mouseover or drop-down box used for
links, there might not be a work around - other than not linking to
that web site as a partner.
- Site Link Search Engine Says Link is there - Look closely at the
web address the search engine goes to. It may be going off the
main site to a paid link manager system on another site. We have
seen many cases of times when the links manager system claims that the
link is on the target customer site, but it in fact is not.
Sometimes sites will not even store links on their site, but keep them
at the link manager company. Sometimes there are great
intentions of doing the right thing and getting the link to the right
site, but the link at the links manager site may never in fact
download to their customer site - which is the site you are trying to
link to. Good luck in trying to fix this one. We find that
sites that use linking services are very un-responsive to emails.
That is because the site is paying for a service and doesn't want to
be involved. The linking service often accepts no complaints
either. So, there is often just no one to talk to, if these
super automated systems fail.
SUMMARY of SUGGESTED USAGE
We suggest insisting on keeping link coding simple
– for your site and your possible link partners, both for the benefit of
search engine bots to give you proper credit, and also of course for the
speed and ease of this bot. When checking for reciprocal links, we
suggest using the main site url for small to medium size sites. For
huge sites – particularly those a few thousands pages or more – we do
suggest beginning the search with the last known page of the link.
For checking large number of sites on multi-site operation, we suggest
breaking it up to smaller amounts to ensure least loss of data.
Internet problems can cause jams in the program. Please save the
data and restart at times - if you have many sites to do.
If you follow these suggestions, as we have, we
hope that you will find site crawler a great tool. Thanks for your
interest in our rpsoft 2000 products.
|
 |
 |
$ 29.95 Download It
Now from The Virtual Software Store using Visa, Mastercard, AMEX,
Discover, a USA-based checking account, prepaid InternetCash(tm)
Cards or your Microsoft Passport wallet. Immediately download and
install it on your computer. Offline payment options also available. |
return
|
|