Profile Details

Toggle Sidebar
Recent updates
  • Yeah, I've seen those links - was just hoping I could escalate from here instead of having to re-explain the problem. I went ahead and put in a support ticket and attached a link to this thread. I have to get this fixed.

  • What's the proper procedure for escalating this to upper level/engineer type tech support. I don't care if I have to pay, I need help getting this fixed. Mods? Anyone?

  • Yep, seen em all. I mentioned that I am running cacheless in my initial post (even put it in italics because I knew someone would mention it ;) . It's not a 'my hardware is too slow' problem - it's a problem with the dansguardian processes shooting up from like 300 to 1000 in the space of 5 minutes and then sticking there forever (making the internet unuseable) until I reset the proxy cache (which there shouldn't even be a cache, since it's set to run cachless - but I think just changing any setting that makes dansguardian restart is what fixes it.)

    I need to know how to do one of three things:

    1. Increase the number of maxchildren past 1000.
    2. Limit the amount of processes/connections that are allowed to come from one IP/device.
    3. If it's a computer/device/etc on the network that is doing this, i need to know how to find them - I've tried everything, nothing stands out when looking at the network visualizer, etc, when it is locked up. If I run TOP, well...you can see the results of that in the posts above - nothing conclusive.

  • By the way - this is still happening. Every. Day.

    It's also started to creep up on our main network, not as often, but it has happened a few times - which makes me very worried.

    This seems to be a fatal flaw in Clearos/dansguardian that once the maxchildren processes are maxxed out, the system can't recover. I've received no suggestions that I haven't already tried, I guess I'm going to have to start looking for another content filter solution.

  • Nick, thanks for the article - I just read it, and have read many others like it. They usually point to some sort of resource problems (not enough cpu power, not enough ram, etc.) and how to tune dansguardain to work within those limited resources. The server I'm running is overkill to the extreme for this software, and always has resources to spare - never swaps, is running cacheless, etc.

    It worked fine over the weekend (which is not unusual, it seems to work fine for 24-72hrs.) I checked on the processes this morning and they were fine, 900 or so (with 200-500 of those being used by dansguardian.) and then bam, around 9AM, it went to 1672 (dansguardian using 1003) and internet was functionally dead until I reset the web proxy cache (which it shouldn't be using any of.)

    If it IS a particular problem user (malware, bit torrent, etc.) how do I find the offending IP/device? I can navigate the web interface and run any command I like easily while the processes are stuck - I just don't know where to look...

    Here is the output to the command you asked me to run:

    [root@mensdorm ~]# lspci -k | grep Eth -A 3
    01:00.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM5709 Gigabit Ethernet (rev 20)
    Subsystem: Dell PowerEdge R610 BCM5709 Gigabit Ethernet
    Kernel driver in use: bnx2
    Kernel modules: bnx2
    01:00.1 Ethernet controller: Broadcom Corporation NetXtreme II BCM5709 Gigabit Ethernet (rev 20)
    Subsystem: Dell PowerEdge R610 BCM5709 Gigabit Ethernet
    Kernel driver in use: bnx2
    Kernel modules: bnx2
    02:00.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM5709 Gigabit Ethernet (rev 20)
    Subsystem: Dell PowerEdge R610 BCM5709 Gigabit Ethernet
    Kernel driver in use: bnx2
    Kernel modules: bnx2
    02:00.1 Ethernet controller: Broadcom Corporation NetXtreme II BCM5709 Gigabit Ethernet (rev 20)
    Subsystem: Dell PowerEdge R610 BCM5709 Gigabit Ethernet
    Kernel driver in use: bnx2
    Kernel modules: bnx2
    03:00.0 SCSI storage controller: LSI Logic / Symbios Logic SAS1068E PCI-Express Fusion-MPT SAS (rev 08)


    Also, I changed those two other mistake posts, so you can delete them if you like - I only set them to private so everyone wouldn't have to see them. thanks!

  • Dans Guardian processes maxing out (1000) and internet slows to crawl

    I am running clearos 6.x Professional on a Dell Poweredge R610 server with dual quad-core xeons (8 cpus), 32gb RAM, 15k SAS drives in Raid1, etc (I.E. it's a very fast machine.)

    It's primary use is gateway w/ content filtering in a dorm with about 150-200 boys.

    Our internet connection is 100/30mbps fiber.

    Here's the problem (and this doesn't happen on our other two installations - our women's dorm gateway, or or main campus gateway. Each of those locations have a 100/30 connection and same model server as well.

    Internet runs great, bandwidth usage gets pretty high at peak times (netflix, playstation, etc.) but web browsing is still fine. But at some point the internet just goes to an absolute crawl - and I've figured out the why, I just need a solution to fix it.

    If I go to look at "resource report" everything is fine, everything has plenty left, usually 95-99% cpu usage left, 20gb of RAM free, no swap being used, etc. But if I look at the 'processes' - if it is at 1671-1674 or so, I know the internet is at a crawl. Processes normally stay at around 800-1000. But once it gets to 1600-something, it is absolutely stuck until I touch the content filter in some way to reset it (restart dansguardian, delete proxy cache, change a content filter setting, anything.)

    I've been through all the guides on optimizing squid. I am already running cacheless. Here is the relevant portion my dansguardian config file (although I've changed it SEVERAL times, this is the latest iteration):

    # sets the maximum number of processes to spawn to handle the incoming
    # connections. Max value usually 250 depending on OS.
    # On large sites you might want to try 180.
    maxchildren = '999'


    # sets the minimum number of processes to spawn to handle the incoming connections.
    # On large sites you might want to try 32.
    minchildren = '64'


    # sets the minimum number of processes to be kept ready to handle connections.
    # On large sites you might want to try 8.
    minsparechildren = '32'


    # sets the minimum number of processes to spawn when it runs out
    # On large sites you might want to try 10.
    preforkchildren = '16'


    # sets the maximum number of processes to have doing nothing.
    # When this many are spare it will cull some of them.
    # On large sites you might want to try 64.
    maxsparechildren = '64'


    # sets the maximum age of a child process before it croaks it.
    # This is the number of connections they handle before exiting.
    # On large sites you might want to try 10000.
    maxagechildren = '2000'


    The last change I made was taking the maxagechildren down from 10000 to 2000, thinking that maybe some processes were getting hung and that it would kill off some of them earlier and make room. Well, it worked for about 24 hours. Which is usually the case - I have to fix this 'dansguardian processes maxxed out' issue about every 24-48 hours.

    Woke up this morning and here's what the processes looked like on the gateway:

    http://i1093.photobucket.com/albums/i424/blakemcginnis/1671.png

    So I knew the internet was deadly slow.

    Ran the command to see how many processes dansguardian was using:

    ps aux | grep dansguardian-av | wc -l


    1003

    Maxxed out.

    Ran 'TOP' command (not that I understand much of what I'm reading, but I knew you guys might ask, so here is a screenshot:)

    http://i1093.photobucket.com/albums/i424/blakemcginnis/top.png

    Restarted dans-guardian, everything back to normal....for now.

    Can anyone offer any advice? I've tried changing the numbers around on the dansguardian.conf file several times, but it doesn't seem to help. It doesn't appear that I'm running into any resource problems from the hardware as the system load is always around 0.1-0.2, RAM is never even remotely close to fully utilized, etc. It appears to me that I've got a hard process limit problem. I figure there are two ways to approach this:

    1. Find a way to increase the HARD limit of 1,000 processes Dansguardian is allowed to run.

    2. I have someone with a virus, bit torrent, or other program that is generating wayy too many processes/requests. How could I find this person? OR, how can I set a limit on processes ran by individual devices/IPs?

    OR, anyone have any other ideas? How would you handle this? We have the bandwidth - we have the hardware - but every day I have to reset this thing, and usually it's been dead for an hour or two before I can catch it, because I can't monitor it 24/7 - so the kids are usually pretty mad by then. Please Help...

    Thanks!

  • Blake McGinnis
    Blake McGinnis updated their profile
  • Dans Guardian processes maxing out (1000) and killing internet

    I am running clearos 6.x Professional on a Dell Poweredge R610 server with dual quad-core xeons (8 cpus), 32gb RAM, 15k SAS drives in Raid1, etc (I.E. it's a very fast machine.)

    It's primary use is gateway w/ content filtering in a dorm with about 150-200 boys.

    Our internet connection is 100/30mbps fiber.

    Here's the problem (and this doesn't happen on our other two installations - our women's dorm gateway, or or main campus gateway. Each of those locations have a 100/30 connection and same model server as well.

    Internet runs great, bandwidth usage gets pretty high at peak times (netflix, playstation, etc.) but web browsing is still fine. But at some point the internet just goes to an absolute crawl - and I've figured out the why, I just need a solution to fix it.

    If I go to look at "resource report" everything is fine, everything has plenty left, usually 95-99% cpu usage left, 20gb of RAM free, no swap being used, etc. But if I look at the 'processes' - if it is at 1671-1674 or so, I know the internet is at a crawl. Processes normally stay at around 800-1000. But once it gets to 1600-something, it is absolutely stuck until I touch the content filter in some way to reset it (restart dansguardian, delete proxy cache, change a content filter setting, anything.)

    I've been through all the guides on optimizing squid. I am already running cacheless. Here is the relevant portion my dansguardian config file (although I've changed it SEVERAL times, this is the latest iteration):

    # sets the maximum number of processes to spawn to handle the incoming
    # connections. Max value usually 250 depending on OS.
    # On large sites you might want to try 180.
    maxchildren = '999'


    # sets the minimum number of processes to spawn to handle the incoming connections.
    # On large sites you might want to try 32.
    minchildren = '64'


    # sets the minimum number of processes to be kept ready to handle connections.
    # On large sites you might want to try 8.
    minsparechildren = '32'


    # sets the minimum number of processes to spawn when it runs out
    # On large sites you might want to try 10.
    preforkchildren = '16'


    # sets the maximum number of processes to have doing nothing.
    # When this many are spare it will cull some of them.
    # On large sites you might want to try 64.
    maxsparechildren = '64'


    # sets the maximum age of a child process before it croaks it.
    # This is the number of connections they handle before exiting.
    # On large sites you might want to try 10000.
    maxagechildren = '2000'


    The last change I made was taking the maxagechildren down from 10000 to 2000, thinking that maybe some processes were getting hung and that it would kill off some of them earlier and make room. Well, it worked for about 24 hours. Which is usually the case - I have to fix this 'dansguardian processes maxxed out' issue about every 24-48 hours.

    Woke up this morning and here's what the processes looked like on the gateway:

    http://i1093.photobucket.com/albums/i424/blakemcginnis/1671.png

    So I knew the internet was deadly slow.

    Ran the command to see how many processes dansguardian was using:

    ps aux | grep dansguardian-av | wc -l


    1003

    Maxxed out.

    Ran 'TOP' command (not that I understand much of what I'm reading, but I knew you guys might ask, so here is a screenshot:)

    http://i1093.photobucket.com/albums/i424/blakemcginnis/top.png

    Restarted dans-guardian, everything back to normal....for now.

    Can anyone offer any advice? I've tried changing the numbers around on the dansguardian.conf file several times, but it doesn't seem to help. It doesn't appear that I'm running into any resource problems from the hardware as the system load is always around 0.1-0.2, RAM is never even remotely close to fully utilized, etc. It appears to me that I've got a hard process limit problem. I figure there are two ways to approach this:

    1. Find a way to increase the HARD limit of 1,000 processes Dansguardian is allowed to run.

    2. I have someone with a virus, bit torrent, or other program that is generating wayy too many processes/requests. How could I find this person? OR, how can I set a limit on processes ran by individual devices/IPs?

    OR, anyone have any other ideas? How would you handle this? We have the bandwidth - we have the hardware - but every day I have to reset this thing, and usually it's been dead for an hour or two before I can catch it, because I can't monitor it 24/7 - so the kids are usually pretty mad by then. Please Help...

    Thanks!