I seem to have two websites that are killing the whole server.
Pointing down in the logs it seems to be Microsoft that is killing it.
I dont get it,
on the 27th i see logs from BingBot hitting the server over 65 thousand times........
Today alone before i put a block on it, thats only 9 hours of crawling time, it managed to do 4328 hits.
Why oh fucking why are they setting there bots to go out like crazy!!! hours of crawling on multiple sites just kills the server.
Anyone else come across this with Bing? Dont know if just deny it access to the site. or monitor it to see if its a spoof and being attacked....
Bots, User agent, Robots, Crawls.......
- theENIGMATRON
- Website Developer
- Posts: 4326
- Joined: Thu Mar 05, 2009 9:10 pm
- PSN ID: theENIGMATRON
- Steam ID: theenigmatron
- Game of the Week: Barbie Beauty Boutique
- Movie of the Week: Twilight Saga
- InfiniteStates
- God Like Gamer
- Posts: 4832
- Joined: Thu Jan 15, 2009 6:31 pm
- PSN ID: InfiniteStates
Dunno, but it sounds like the devil and the deep blue sea... You need the engines to know about your shit. I guess knowing M$, their bot is just wank. It probably takes 1000ms to do what takes a Google bot 1ms 

- Symonator
- LadyBirds!
- Posts: 4936
- Joined: Thu Jan 15, 2009 1:03 pm
- PSN ID: Symonator
- Steam ID: pbr_djsy
- Game of the Week: Day Z
- Movie of the Week: Batman - DKR
- Location: West Mids UK
- Contact:
Ops sorry Sys, i stupidly edited your post instead of posting a new reply





DayZ UK 1 - Filter: Dayzmad
Paradrop spawns | build your own base | refined repair system | new bandit system
Vist the web http://www.dayzmad.com to find out more!
Paradrop spawns | build your own base | refined repair system | new bandit system
Vist the web http://www.dayzmad.com to find out more!
- DJ-Daz
- Admin - Nothing Better To Do.
- Posts: 8922
- Joined: Wed Jan 14, 2009 1:54 pm
- PSN ID: DJ-Daz-
- XBL ID: DJ Dazbo
- Steam ID: DJ-Dazbo
Over successful SEO?
There's probably some bad bots in there too. Might be worth keeping an eye on the IP's or only allowing a range of IP's from Bing and the other one.
There's probably some bad bots in there too. Might be worth keeping an eye on the IP's or only allowing a range of IP's from Bing and the other one.

- theENIGMATRON
- Website Developer
- Posts: 4326
- Joined: Thu Mar 05, 2009 9:10 pm
- PSN ID: theENIGMATRON
- Steam ID: theenigmatron
- Game of the Week: Barbie Beauty Boutique
- Movie of the Week: Twilight Saga
force bots to visit every 7 days?
You control that one VIA htaccess? or robot.txt
All i know on the robots.txt file is
User-agent:
Crawl-delay:
disallow:
Cral Delay being the Seconds between each request to your website
Have currently have it set to 300 seconds.
So each request that comes from the search engine is 5 minutes apart
Meaning a possible of 288 calls a day can be made.
i am going to set it to 120: meaning 2 minutes: 720 calls a day
Other than this i know of no other way to limit the fuckers other than disallowing them, But bad for SEO..
Well i have a list of over 500 Ip's to go through,
I have gone through Bings and they all seem to be legit,
Over done on SEO..... i duno.... stupid bots more like
why would you crawl a website every day if no new content is being created lol
You control that one VIA htaccess? or robot.txt
All i know on the robots.txt file is
User-agent:
Crawl-delay:
disallow:
Cral Delay being the Seconds between each request to your website
Have currently have it set to 300 seconds.
So each request that comes from the search engine is 5 minutes apart
Meaning a possible of 288 calls a day can be made.
i am going to set it to 120: meaning 2 minutes: 720 calls a day
Other than this i know of no other way to limit the fuckers other than disallowing them, But bad for SEO..
Well i have a list of over 500 Ip's to go through,
I have gone through Bings and they all seem to be legit,
Over done on SEO..... i duno.... stupid bots more like
why would you crawl a website every day if no new content is being created lol
- DJ-Daz
- Admin - Nothing Better To Do.
- Posts: 8922
- Joined: Wed Jan 14, 2009 1:54 pm
- PSN ID: DJ-Daz-
- XBL ID: DJ Dazbo
- Steam ID: DJ-Dazbo
Nice work though Dave.theENIGMATRON wrote:force bots to visit every 7 days?
You control that one VIA htaccess? or robot.txt
All i know on the robots.txt file is
User-agent:
Crawl-delay:
disallow:
Cral Delay being the Seconds between each request to your website
Have currently have it set to 300 seconds.
So each request that comes from the search engine is 5 minutes apart
Meaning a possible of 288 calls a day can be made.
i am going to set it to 120: meaning 2 minutes: 720 calls a day
Other than this i know of no other way to limit the fuckers other than disallowing them, But bad for SEO..
Well i have a list of over 500 Ip's to go through,
I have gone through Bings and they all seem to be legit,
Over done on SEO..... i duno.... stupid bots more like
why would you crawl a website every day if no new content is being created lol
I think it might have something to do with the downtime, ranking and re-checking once the site comes back up?

- Symonator
- LadyBirds!
- Posts: 4936
- Joined: Thu Jan 15, 2009 1:03 pm
- PSN ID: Symonator
- Steam ID: pbr_djsy
- Game of the Week: Day Z
- Movie of the Week: Batman - DKR
- Location: West Mids UK
- Contact:
all respectable search engine bots will respect the directives in robots.txt, of course the spam ones won't..cunts.
What i'd do dave is get the user agents in your setup like:
User-agent: *
Disallow: /cgi-bin/
Disallow: /whatever
etc, to stop that for a start, just allow it to visit on the pages you specify of course.
Doesn't need to visit every page as you have follow links on.
your current robots allows allows all files to be scanned now but obviously use the normal way to block certain areas.
Just increase the amount of time it will be on the site and simply block most shitty ones, really yahoo/bing/ask/google are really all you need.
Problem is the site covers alot of topics, so alot of spammy bots come here to cache it, clear out and disallow all but the main bots.
What i'd do dave is get the user agents in your setup like:
User-agent: *
Disallow: /cgi-bin/
Disallow: /whatever
etc, to stop that for a start, just allow it to visit on the pages you specify of course.
Doesn't need to visit every page as you have follow links on.
your current robots allows allows all files to be scanned now but obviously use the normal way to block certain areas.
Code: Select all
http://mess-hall.co.uk/robots.txt
Problem is the site covers alot of topics, so alot of spammy bots come here to cache it, clear out and disallow all but the main bots.
DayZ UK 1 - Filter: Dayzmad
Paradrop spawns | build your own base | refined repair system | new bandit system
Vist the web http://www.dayzmad.com to find out more!
Paradrop spawns | build your own base | refined repair system | new bandit system
Vist the web http://www.dayzmad.com to find out more!
-
- Information
-
Who is online
Users browsing this forum: Amazon [Bot] and 10 guests