QoS not working properly

Offline

Issue

QoS not working properly

Resolved

0 votes

Hello all,

I recently enabled QoS on my home ClearOS firewall.
I started with a very simple, single-rule configuration for testing purposes and I can't seem to make it work, even though it looks to be configured properly.

To start, I am just trying to set all traffic from a single source IP to priority 1.
I see the tc classes and discs and I can see the iptables "mark" rules it generates.
The problem is, even though the hit counter on the mark rule is increasing correctly, the traffic does not seem to be hitting the correct traffic class/queue.

Can anyone please take a look at the attached screenshots and point me in the right direction?
My knowledge of QoS on linux is decent (not great but not terrible) and from what I can see, this looks like it should be working to me... but I am clearly missing something.

Any assistance would be greatly appreciated!

--Sean

Attachments:

In Bandwidth and QoS Manager

Sunday, January 05 2020, 12:13 AM

Share this post:

Responses (15)

Accepted Answer
Nick Howitt

Offline
Sunday, January 05 2020, 12:26 PM - #Permalink
Resolved

0 votes

I always wondered how to test it as there was a big related change part way through 7.6 iirc. For a long time ClearOS have been patching the kernel to add an IMQ device used for QoS, but the kernel comes with native IFB support. It was the aim of ClearOS to move away from a patched kernel to an upstream kernel, but this meant moving QoS over to IFB. The overhead of patching the kernel was quite high.

Anyway, can you do me a favour and try the same test using IMQ and see if you get the same results? To do this you will need to reboot ClearOS to a v7 kernel (so not the 1062 line as they are all el7). You get about 5secs during boot where you can select a different kernel.

Once you have booted to the older kernel, go to /etc/clearos/qos.conf and change:
QOS_ENABLE_IFB="yes"
to:
QOS_ENABLE_IFB="no"
Then restart the firewall with a:
systemctl restart firewall
This will flip the QoS mechanism to IMQ. Then can you repeat your test?

Do not try running with IMQ on an el7 kernel as the firewall will panic.
The reply is currently minimized Show
Accepted Answer

Sean Faircloth

Offline
Sunday, January 05 2020, 10:43 PM - #Permalink
Resolved

0 votes

I'm running 7.7.2
I have two non-el7 kernels and neither will boot. After selecting them on the boot menu, the screen goes blank (like it does with the working kernels) but never comes back to life.
The reply is currently minimized Show
Accepted Answer
Nick Howitt

Offline
Monday, January 06 2020, 12:47 PM - #Permalink
Resolved

0 votes

I am not sure why you can't boot to the old kernels. I've just tested on my test box and I had no problem.

I really don't know how to troubleshoot this. I can try and contact the dev who did this but he has moved on. Otherwise it is off to the LARTC mailing lists.

One tin I am curious about is why you changed your Rate to Quantum? It is probably not relevant as I get similar output to you.
The reply is currently minimized Show
Accepted Answer
Nick Howitt

Offline
Monday, January 06 2020, 03:58 PM - #Permalink
Resolved

0 votes

Apparently the implementation may only be working for upstream....
The reply is currently minimized Show
Accepted Answer

Sean Faircloth

Offline
Tuesday, January 07 2020, 02:38 PM - #Permalink
Resolved

0 votes

Nick Howitt wrote:

Apparently the implementation may only be working for upstream....

I'm not sure what you mean... I don't see it as working for either up or down stream.
Also, do you know why the ClearOS implementation would put the iptables reference in the postrouting chain as opposed to the pre?

I tried the quantum left as auto and also set manually to approx. 90% of my link speed up/down. Neither worked but the issue isn't the size of the queue, it is that the packets are not correcting hitting the right queue.

It seems to me like the entire QoS implementation is fundamentally broken...
The reply is currently minimized Show
Accepted Answer

Sean Faircloth

Offline
Tuesday, January 07 2020, 02:42 PM - #Permalink
Resolved

0 votes

Nick Howitt wrote:

I am not sure why you can't boot to the old kernels. I've just tested on my test box and I had no problem.
.

Is it possible to get the kernel source packages that clearos us built on so I can try rolling a custom kernel with IMQ enabled?
The reply is currently minimized Show
Accepted Answer
Nick Howitt

Offline
Tuesday, January 07 2020, 03:18 PM - #Permalink
Resolved

0 votes

It is looking like the upstream rules go in the POSTROUTING chain and the downstream rules in the PREROUTING chain. Where would you expect the rules to be?

With respect to the kernel sources, we no longer compile our own and get them straight from Centos, so they will have the sources. You have made me think, however. If you disable QoS before you reboot, can you boot to kernel? The reason I ask is because iptables also needs to be patched for IMQ. I wonder if a mistake was made there and we should have still patched iptables to allow people to boot to old kernels with QoS? 'll ask some questions.
The reply is currently minimized Show
Accepted Answer

Sean Faircloth

Offline
Tuesday, January 07 2020, 03:22 PM - #Permalink
Resolved

0 votes

Nick Howitt wrote:

It is looking like the upstream rules go in the POSTROUTING chain and the downstream rules in the PREROUTING chain. Where would you expect the rules to be?

With respect to the kernel sources, we no longer compile our own and get them straight from Centos, so they will have the sources. You have made me think, however. If you disable QoS before you reboot, can you boot to kernel? The reason I ask is because iptables also needs to be patched for IMQ. I wonder if a mistake was made there and we should have still patched iptables to allow people to boot to old kernels with QoS? 'll ask some questions.

Other implementations I have seen put the mark rules directly in the pre-routing chain for up and down stream.
I will try disabling qos and booting one of the other kernels when I get home tonight and let you know how it works.
The reply is currently minimized Show
Accepted Answer

Sean Faircloth

Offline
Tuesday, January 07 2020, 03:28 PM - #Permalink
Resolved

0 votes

I just noticed something interesting... while there are no packets hitting the 1:10 queue as I would expect, given the mark rule... there ARE packets hitting the 1:11 queue which I don't understand.
Could this be a simple "off-by-one" typo somewhere putting the packets in the wrong queue?
Is it possible to packet capture only a specific TC queue with tcpdump to confirm the packets hitting 1:11 are the ones being marked by the lone iptables rule setting 0xa (which should go to 1:10) ?
The reply is currently minimized Show
Accepted Answer
Nick Howitt

Offline
Tuesday, January 07 2020, 07:55 PM - #Permalink
Resolved

0 votes

Rather than waste your time, a couple of builds have been pushed through this afternoon. Can you do a:
yum update app-qos --enablerepo=clearos-updates-testing
You are trying to update to app-qos-2.5.5-1.v7. If it is not there, wait an hour or so. The dev says this does not fix all your issues (reading between the lines, the wrong bucket possibly being used), but it is a better starting point.

If you go back to an old kernel, you may need to try with QoS disabled and you may need to downgrade iptables but there is no way of doing that unless I make the package specifically available to you. The person I needed to speak to tonight about iptables, is not around so I wouldn't even try testing IMQ.
The reply is currently minimized Show
Accepted Answer

Sean Faircloth

Offline
Tuesday, January 07 2020, 08:16 PM - #Permalink
Resolved

0 votes

Understood.
I'll try updating the QoS package and let you know if that has any effect.
Did the dev seem like he/she thought the "off-by-one-typo" theory on the 1:10 vs 1:11 queue was a possibility?
And, if so, what would a fix for that look like from a turnaround standpoint?

Thanks for your assistance thus far. Much appreciated
The reply is currently minimized Show
Accepted Answer

Sean Faircloth

Offline
Tuesday, January 07 2020, 08:36 PM - #Permalink
Resolved

0 votes

So good news and bad news...

Bad news is, after looking at the "/etc/clearos/qos.conf" file, it realized the two included rules (for <=64 byte packets), are where the packets in 1:11 were coming from.
QOS_PRIOMARK4_CUSTOM="\
TCP_ACK_Up|*|1|0|1|-p tcp -m length --length :64
TCP_ACK_Down|*|1|1|1|-p tcp -m length --length :64

Good news is, upgrading to 2.5.5-1v.7 the 1:10 queue is working as expected now.

Thank you very much for the help!

Every 1.0s: tc -s -g class show dev enp1s0 Tue Jan 7 15:36:33 2020

+---(1:1) htb rate 6Mbit ceil 6Mbit burst 1599b cburst 1599b
| Sent 7843405 bytes 87944 pkt (dropped 0, overlimits 0 requeues 0)
| backlog 0b 0p requeues 0
|
+---(1:11) htb prio 1 rate 900Kbit ceil 6Mbit burst 1599b cburst 1599b
| Sent 5490979 bytes 83506 pkt (dropped 0, overlimits 0 requeues 0)
| backlog 0b 0p requeues 0
|
+---(1:10) htb prio 0 rate 900Kbit ceil 6Mbit burst 1599b cburst 1599b
| Sent 33768 bytes 144 pkt (dropped 0, overlimits 0 requeues 0) <<<<<<<<<<<<<<<<<<<<<<<<
| backlog 0b 0p requeues 0
|
+---(1:13) htb prio 3 rate 840Kbit ceil 6Mbit burst 1599b cburst 1599b
| Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
| backlog 0b 0p requeues 0
|
+---(1:12) htb prio 2 rate 840Kbit ceil 6Mbit burst 1599b cburst 1599b
| Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
| backlog 0b 0p requeues 0
|
+---(1:15) htb prio 5 rate 840Kbit ceil 6Mbit burst 1599b cburst 1599b
| Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
| backlog 0b 0p requeues 0
|
+---(1:14) htb prio 4 rate 840Kbit ceil 6Mbit burst 1599b cburst 1599b
| Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
| backlog 0b 0p requeues 0
|
+---(1:16) htb prio 6 rate 840Kbit ceil 6Mbit burst 1599b cburst 1599b
Sent 2318658 bytes 4294 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
The reply is currently minimized Show
Accepted Answer
Nick Howitt

Offline
Tuesday, January 07 2020, 09:14 PM - #Permalink
Resolved

0 votes

The small packets are typically the SYN and ACK packets so they are given the highest priority. SSH is next because if there is a problem, you need SSH to sort it. For these reasons it is no recommended to use these two priority classes for your own rules.

I've no idea about any turnround time. I've had only a couple of e-mails from the dev. He is in Canada and I'm in the UK so it is not like wandering down to chat to him.

You are welcome to have a look at the code. It is in /usr/clearos/apps/qos/deploy/libqos.lua. I am not really a coder and I don't find LUA easy to read, especially its "if" statments and loops. The guts of it cames from RunBandwidthExternal(). There is an offset in some of the loops which may need adjusting, but I'm guessing.
The reply is currently minimized Show
Accepted Answer
Nick Howitt

Offline
Tuesday, January 14 2020, 09:55 PM - #Permalink
Resolved

0 votes

Hi Sean,
There is now an update available or syncing to the mirrors over the next couple of hours. Can you try:
yum update app-firewall app-qos *netify-* --enablerepo=clearos-updates-testing,clearos-contribs-testing
If you don't have any of the netify suite, including the Application and Protocol Filters, then you can drop *netify-* from the command. You must update the app-firewall at the same time as app-qos.
The reply is currently minimized Show
Accepted Answer
Nick Howitt

Offline
Wednesday, January 15 2020, 05:11 PM - #Permalink
Resolved

0 votes

There is a further update to app-qos to app-qos-2.5.7-1.v7 as downstream marking was not taking place for packets to a specified LAN IP. to install do a:
yum update app-qos --enablerepo=clearos-updates-testing

The next bit is from the devs:
When testing, to monitor upstream classes your tc comand works:
tc -s -g class show dev ppp0
Obviously change your external interface to suit.

To monitor downstream classes you need to monitor the ifbX device e.g:
tc -s -g class show dev ifb0
For single-WAN environments, the IFB interface will always be ifb0.

For multiwan, running firewall-start in debug mode will display the mapping, such as:

# /usr/sbin/firewall-start -d ... firewall: Running external QoS bandwidth manager: app-qos/libqos firewall: 0: 1 => ppp0 firewall: 1: 2 => ppp1 firewall: 2: 3 => ens34 ...
Which is a little ugly, but this tells us that:

ifb0 is mapped to ppp0.
ifb1 is mapped to ppp1.
ifb2 is mapped to ens34.
The reply is currently minimized Show

Your Reply

Please login to post a reply

You will need to be logged in to be able to post a reply. Login using the form on the right or register an account if you are new here.

Community Forums

ClearOS Portal

ClearVM Platform

ClearVM 2 Platform

Forums