The trendiction bot crawls public websites, including newssites, messageboards and blogs, including their comments.
If our spider doesn't behave well (e.g. too many requests, recursive urls) please contact us so we can fix this issue!
Why do you access my website?
We crawl your website to integrate your website into our public search engine.
Additionaly, we process and filter this data so our clients can access this data through our web service apis. Among our clients are market research companies, marketing agencies, search engines and ohter web applications who use us to outsource their crawling in order to save ressources (imagine every media company crawling your site).
More informations on our products are located here.
How much bandwith do we use?
We try to be as efficient as possible when we request your website.
We try to keep the bandwith and requests as low as possible:
- In addition to our caching, we use the gzip compression to save bandwith between your servers and ours.
- We also use the If-Modified-Since and the ETag HTTP Headers to skip requesting unchanged web pages.
- We adapt the crawl rate based on the hits found and the rank of the site, internal caches, and use state of the art compression algorithms to further reduce bandwith usage.
How can I recognize that your bot accesses my web site?
We use the following user agent:
Mozilla/5.0 (Windows; U; Windows NT 6.0; en-GB; rv:1.0; trendictionbot0.5.0; trendiction search; http://www.trendiction.de/bot; please let us know of any problems; web at trendiction.com) Gecko/20071127 Firefox/220.127.116.11
How can I instruct the trendiction bot from not accessing my web site?
If you would like to exclude a part or the entire website from crawlers accessing your site, you can do so by creating a file called "robots.txt" in the root directory of your website:
Or if you would only like to exclude our spider from accessing your site:
The robots.txt file is a widely recognized format to instruct crawlers on what they can crawl and what not. More informations regarding robots.txt are available on the site http://www.robotstxt.org/wc/norobots.html.
Due to our internal caching procedures, it can take up to 5 days for the updated robots.txt file to be active.
However, most people would like to have their website indexed because it can be found in search engines later on and generate traffic.