Web Crawlers Bundle

A bundle of features for the Spiders access. Hits tracking, complete listing for info, Specific usergroup for Spiders, etc

It’s globally a merge with some small addons i wrote in the past, and as i did not want to release a ton of minimal tools that just fit together, i make a real bundle, 4 or 5 tools together, with activation and permissions settings when needed.

Settings:

nex_crawlers_bundle_options.jpg

Spiders List: a little spiders tracker for your forum. It’s not tracking each page the engine is viewing, because this is pointless. Instead, It is listing the name of the spiders that visit your sites, the last date of a visit, the number of unique visits and the number of pages viewed. That information is not very important for the indexation of your site, but it helps to see why your site may be occupied or not. You can then take action if a crawler is visiting and still giving no result on search engines.

nex_crawlers_bundle_list.jpg

You can see it in action here: vbEnhancer.com – Crawlers List

Specific Usergroup for Spiders: i released this addon on vb.org long time ago, and it was copied in source, but this version is updated and have more flexibility. You simply have to choose the proper usergroup in the settings so when a spider/crawler visit your site, it is considered having some permissions… it’s useful if you do not want to fill your robots.txt file with strange access blocks. This let you give access to crawlers for profiles but not visitors messages, etc…

Also remember to follow the TOS of the search engines you are registered to. Google until lately was blocking sites that were ghosting their content.

Display Spiders in WOL: and in any page showing « Currently Active Users » (showthread, forumdisplay, etc) … that way, you see where these beasts are visiting.. 🙂

nex_crawlers_bundle_wol_ug_markup.jpg

As you can see in this listing, the markup for the usergroup applied to the crawlers give some style to the web crawlers, easier to trace that way.

… some other tools are to be decided to join in the bundle, i’ll see later!

CRON JOB:
to make it easier on the server, there is a cronjob storing the hourly stats about the crawlers… once the cronjob is done once (it’s the cron named Hourly #1), the stats appear in the right place…:

nex_crawlers_bundle_info.jpg

…update: may 1st, 10:50, a small change, the Crawlers listing will now update the spiders list in cache if the file changed, so you can update it when needed.

…update: may 26th, a change related to a request by Calystos here, as we can apply a usergroup to the crawlers, we will now be able to add some markup to that usergroup and it will show in the WOL and online.php …

and in the Who’s Online page (demo vbEnhancer.com):

nex_crawlers_bundle_online.jpg

i made it so the « Spider » in front of each spider is deactivated in the online.php page, because it’s pointless if you ask me… but you can deactivate the plugin of the hook « online_bit_complete » if you prefer.

note: 17/06/09: update to 1.1.1, now will update the proper count and names of web crawlers in Active Users of Showthread and Forumdisplay pages… thanks to all who reported it, mainly [user]xOBKx[/user]… 🙂

note: 19/06/09: no version change, but added the spiders count in the WOL page itself… from [user]xOBKx[/user]’s suggestion.

note: 09/07/09: no version change yet, Dream updated his spiders_vbulletin.xml, so i provide it in this first post, if you want to upload it to your /includes/xml/ directory… it will update the list instantly when needed.

note: 09/07/09 by night: bundle updated with the latest spiders list from Dream, and updated some bug fixes suggested by [user]xOBKx[/user], like the extra comma when there was nobody online, and the uncached template.

Views: 109

53 thoughts on “Web Crawlers Bundle”

  1. nexia dit :

    Here is some information for people who know nothing about Web Crawlers…

    Wikipedia wrote:
    A Web crawler is a computer program that browses the World Wide Web in a methodical, automated manner. Other terms for Web crawlers are ants, automatic indexers, bots, and worms[1] or Web spider, Web robot, or—especially in the FOAF community—Web scutter[2].
    This process is called Web crawling or spidering. Many sites, in particular search engines, use spidering as a means of providing up-to-date data. Web crawlers are mainly used to create a copy of all the visited pages for later processing by a search engine that will index the downloaded pages to provide fast searches. Crawlers can also be used for automating maintenance tasks on a Web site, such as checking links or validating HTML code. Also, crawlers can be used to gather specific types of information from Web pages, such as harvesting e-mail addresses (usually for spam).
    A Web crawler is one type of bot, or software agent. In general, it starts with a list of URLs to visit, called the seeds. As the crawler visits these URLs, it identifies all the hyperlinks in the page and adds them to the list of URLs to visit, called the crawl frontier. URLs from the frontier are recursively visited according to a set of policies.

    Source: http://en.wikipedia.org/wiki/Web_crawler

  2. nexia dit :

    About compatibility and replacement for existing products, here is a list of known addons released on vb.org that can have an interaction with this product:

    1- Track Guest Visits – vBulletin.org Forum … basically, you can drop it, but as this product also track the activity of guests and web crawlers, it’s up to you. it will be more server intense as it’s doing most of the things this product does, it double the queries etc and is not caching anything.

    2- Spiders on ForumHome [NO FILE EDIT] – vBulletin.org Forum … this one is so basic… drop it, we’re doing it 5 times better.

  3. KURTZ dit :

    hi Nexy, i think there is an incompatibility trouble with this hack 🙂

    Split counts for most ever Members & Guests – vBulletin.org Forum

    edit: also i’ve find out an uncached template … check the pick 😉

  4. nexia dit :

    hum, everything coming from Paul is incompatible anyway, but i checked the code and i don’t see why it would cause problem… i will test it later today

  5. KURTZ dit :

    i’ve updated the previous post 🙂

  6. nexia dit :

    OH, forgot one hook… damn uncached templates… will look at it too.

  7. KURTZ dit :

    OK, but i think that this hack needs an improvement … like an usergroups option to allow who can view the stats in the homepage … or not?

  8. nexia dit :

    sure, there will be in the next alteration… this version is the first because the goal is done… now, the fixes for prefs etc.

  9. KURTZ dit :

    @nexia 20955 wrote:

    sure, there will be in the next alteration… this version is the first because the goal is done… now, the fixes for prefs etc.

    superb 🙂 really nice hack Nexy … i dropped the others for this … 😉

  10. puertoblack2003 dit :

    thanks nexia for this.:)

  11. puertoblack2003 dit :

    reporting back..mod is working as described i’m loving it:)

  12. nexia dit :

    rofl… thanks for the report… 😉

  13. Calystos dit :

    Hi! My first post here, hopefully not last, 🙂

    Just wanted to let you know this addon is really REALLY useful! I actually got it from the vBulletin.org site some time back but now I noticed its in the graveyard area.

    I’ve only got one probem with it. Am running vbulletin 3.8.2 and for some reason the bots aren’t showing up or being moved into the « Bots » usergroup I setup an made and told the web crawlers setup to point them to. Any ideas?

    Other than that I love this addon, very useful! Great work!

  14. nexia dit :

    hum, did you upload the spiders file in the /xml/ folder of your vBulletin installation?

  15. Calystos dit :

    Yup, the spiders was uploaded and works fine. Is the addon mean’t to move & class the bots in the new usergroup or is it just mean’t to use that usergroups rights/privalidges?

  16. nexia dit :

    AH, you post « support » in the wrong thread then.. lol

    it’s not supposed to move the crawlers to the usergroup, but just use their privileges… because the crawlers are not registering an account to the site… they usually act like guests.

  17. Calystos dit :

    Ahh, thats what I thought but wasn’t sure, 🙂

    Is it possible to do something like marking them as that group for when people see them on the who’s online an stuff? I know forum software such as phpbb and others have this feature as such.

    EDIT: Oopsy, I should really see about making a new thread in support (just re-read what you said).

  18. nexia dit :

    hum, i’ll see if that would change something… maybe.. not sure.

  19. Calystos dit :

    Heh, not essential as such but would be a nice feature I guess. Specially for forums where they (like me) have the Bots group marked with a different colour so you can see that they are Bots an whatnot.

    I’ve not noticed any other things that could be tweaked or added atm, it all seems to be working really well an looking good. 🙂

  20. nexia dit :

    actually it is a good suggestion, i’ll see if something is missing to have that markup… i think it is related to the fact that the bots have no userID… i’ll see if i can fix that… it may not be essential but it’s a basic twist.

  21. Calystos dit :

    I was just reviewing the product file, an I think there could be a basic quick (an dirty) hack way of doing it. I’ll give it a whirl tomorrow an if it works I’ll let ya know an post the code. Course theres probably gonna be a better way but if my idea works at least it’d be a good start, 🙂

  22. reeps dit :

    nice mod, thanks

  23. nexia dit :

    First post was edited, version 1.1.0 is now available, filling this request… 😉

    @Calystos 21299 wrote:

    I was just reviewing the product file, an I think there could be a basic quick (an dirty) hack way of doing it. I’ll give it a whirl tomorrow an if it works I’ll let ya know an post the code. Course theres probably gonna be a better way but if my idea works at least it’d be a good start, 🙂

  24. Calystos dit :

    Great work! Will install it immediately an keep you informed of any other ideas I come up with. Or any mods, etc. Likewise for any bugs, etc too.

  25. xOBKx dit :

    Just a heads up.. Web Crawlers appear as Guests in « Currently Active Users Viewing This Thread ».

    http://www.gamingvoice.co.uk/temp/wol_index.jpg
    http://www.gamingvoice.co.uk/temp/wol_thread.jpg

  26. nexia dit :

    yeah, thanks to report, i did not apply the patch to seperate the crawlers in these pages…this is related to the « Active Users » block, which is not on forumhome, as you know.. 😉

  27. xOBKx dit :

    Perhaps hiding the « Web Crawlers » from the legend on these pages would make sense then? If possible.. 🙂

  28. nexia dit :

    i’ll make the job to list them on these pages, it’s mostly done already, i’ll test it after the visit to the grocery store.. 🙂

  29. xOBKx dit :

    Call me anal, but I went ahead and got the « x members, y guests and z web crawlers » thing working on the « Who’s Online » page itself using the hook online_complete aswell.. 🙂

    http://www.gamingvoice.co.uk/temp/online_complete.jpg
    (Yes the guest is me, I used a second browser).

  30. nexia dit :

    ROFL… i will never call you anal, but if you prefer, no problem…

    thanks to remember me that this tool needed to be completed… rofl… will fix in 10

  31. xOBKx dit :

    Another quick thought, there should also be a check to see if there’s any registered users present, otherwise there’s an unnecessary comma at the start of the active users list with the web crawlers..

    I’ve had a quick look at this myself, but couldn’t get it working. I’ll have another go when I have some time if you haven’t beaten me to it. 🙂

  32. nexia dit :

    it’s easy, i just never thought of it… 🙂 will fix this.

    it’s easy… the first number have a value, you just have to check if the number is raised, and if the loop have content… 🙂 i’ll fix this in 10, when the laundry is started

  33. xOBKx dit :

    I managed to fix it, though possibly not in the most elegant of solutions.. Heh. Took me a while to get the logic right, but;
    If registered users present – show comma, else if total spiders AND spider count greater than one – show comma else hide comma.

    Just thought I’d report another bug, this time an uncached template in misc.php?do=crawlers.. GENERIC_SHELL

  34. nexia dit :

    bundle updated with the latest spiders list from Dream, and updated some bug fixes suggested by [user]xOBKx[/user], like the extra comma when there was nobody online, and the uncached template.

    .. also edited first post with new screenshots hosted seperately to avoid thumbnails.

  35. zero5854 dit :

    for some reason after I installed I am not showing the bots stats link under the stats on index? I only show the part after currently active uses? Any help please?

  36. zero5854 dit :

    any help please???

  37. xOBKx dit :

    A URL and details of other installed modifications would be useful.

    This is the template hook where that extra line is called
    forumhome_wgo_stats

  38. nexia dit :

    @zero5854 22170 wrote:

    for some reason after I installed I am not showing the bots stats link under the stats on index? I only show the part after currently active uses? Any help please?

    do you have a custom style? there may be a missing hook in your template that makes this error possible.

    btw, by default, the bots count is not showing when there is no bot on your site… *(instead of the pathetic « and 0 web crawler)

  39. zero5854 dit :

    no it doesnt show even when there are bot…I will try adding the hook as I can assume it wasnt added automatically. thanks for the help

  40. nexia dit :

    is the bots showing in the Who’s online list ? is it showing in the online.php page?

    i want to see your problem too… 🙂

  41. zero5854 dit :

    huh??? It WAS showing yesterday? at least the part you were talking about? Now its not showing at all? hmmmm

    EDIT: it is showing in online.php

    hwhkvncad7pfsv9t53y.png

  42. nexia dit :

    as indicated in this screeny:
    nex_crawlers_bundle_options.jpg

    the last block of settings is ONLY for the « What’s Going On » block, and when there is no bot, there is no count… are they set properly?… if so, maybe just the hook isn’t working…

    do you still have the default style on your site, to verify this?

  43. zero5854 dit :

    confirmed HOOK missing…did this by using defualt theme and everything shows fine. so I know where to put the hook but Im not sure about the code for it.?

  44. nexia dit :

    hum, the text is not based on a hook, only the stats is… the text is related to the phrase $vbphrase[x_members_and_y_guests] if it is different, in vb 3.7 and lower, i can’t do a thing… you need an updated template  » FORUMHOME  » you can PM it to me, i’ll see what i can do.

  45. zero5854 dit :

    ok ill pm you with it..thanks!

  46. nexia dit :

    ok, by the answer in the pm, i could say the problem is fixed… missing hooks in skin!

    *(ok guys, try to visualize it… there is 5 hooks in the skin
    5_04gamakatsu-Siwash.gif
    fishing hooks for salmon

  47. zero5854 dit :

    LOL yep worked perfect thanks alot!

  48. MsJac dit :

    Very nice = Thanks 🙂

    Jacquii.

  49. Taragon dit :

    When no one is online, but a bot, this is what happening:

    active-users.png

    I believe [user]Boofo[/user] has found a fix for this in his Spider Display for vB3.7 Version 1.0.3*, but it required the config.php to be edited.

    *

    Quote:
    Version 1.0.3 –Fixed the leading comma issue when there were no members online.
    Version 1.0.4
  50. nexia dit :

    damn, i thought i corrected this long time ago… i’ll check, i may have revert a code someway…

  51. xOBKx dit :

    The screenshot says there’s 2 online members..

    It could be that your update doesn’t check for invisible users?

    I believe [user]Boofo[/user] has found a fix for this in his Spider Display for vB3.7 Version 1.0.3*, but it required the config.php to be edited.

    Definately no need for a config.php edit – I fixed the issue on my forums before [user]nexia[/user] released his update.

    See above.

  52. nexia dit :

    i’ll update this package when Dream is finished with updating the spiders list… in a day or two i suppose…

    i’ll make a new calculation instead of having simply the « xxx days… » because i have it for 160+ days, it’s pathetic to read…

  53. ctrlbrk dit :

    Thanks for this!

    Mike

Laisser un commentaire

Ce site utilise Akismet pour réduire les indésirables. En savoir plus sur comment les données de vos commentaires sont utilisées.

En savoir plus sur Un Papa Pro

Abonnez-vous pour poursuivre la lecture et avoir accès à l’ensemble des archives.

Continue reading