Forums

destfinal
destfinal
Offline
Resolved
0 votes
Hello,

I am trying to setup a router with a fail over mechanism.

Here is the configuration.

Router
ClearOS 6.3
eth0 - Connected to ISP1 (high speed unlimited)
eth1 - connected to ISP2 (slower speed limited)
eth2 - connected to the LAN (IP Address 192.168.55.1/24)

In clearOS I have configured multiwan so that eth0 has a weight of 4 and eth1 of 1. I have not configured the DHCP server on the router machine. The idea is to use the secondary connection (eth1) if the primary one(eth0) fails.

I have hardcoded the nameservers (in resolv.conf) to 8.8.8.8 and 8.8.4.4 and prevented overwrites of resolv.conf as the switch over does not seem to handle this properly.

Workstation
On the work station side the configuration is as follows:

OS=>Fedora16
interface=>em1(static)
ipaddress=>192.168.55.101
Subnetmask=>255.255.255.0
Default Route(Gateway)=>192.168.55.1
DNS servers=>192.168.55.1

resolv.conf entry
nameserver 192.168.55.1

Note: ClearOS handles DNS caching (I hope)

Operation
When everything is connected my workstation is connecting to the internet, it could resolve names and all.

Here are the results of the some commands from the workstation:

ip route list
[CODE]default via 192.168.55.1 dev em1 proto static
192.168.55.0/24 dev em1 proto kernel scope link src 192.168.55.101 metric 1 [/CODE]

Here is the out for the above commands from the router at the same time

ip route list

[CODE]default via <eth0 ip> dev eth0 proto static
192.168.55.0/24 dev eth2 proto kernel scope link src 192.168.55.1
<eth1 network>/24 dev eth1 proto kernel scope link src <eth1 IP>
<eth0 network>/24 dev eth0 proto kernel scope link src <eth0 IP>[/CODE]

When I remove connection from eth0 on the router, it takes a bit for the routes on the router to get updated; probably after about a minute the routes on the router get updated. And the result of ip route list on the router is as follows:

[CODE]default via <eth1 ip> dev eth1 proto static
192.168.55.0/24 dev eth2 proto kernel scope link src 192.168.55.1
<eth1 network>/24 dev eth1 proto kernel scope link src <eth1 IP>[/CODE]

At the this point the router is able to connect to the internet (obviously through the second connection), able to resolve names and all.

But the workstation neither could resolve names nor could reach the internet through ip addresses. At the same time the workstation could connect to the router. The route ie. the result of ip route list is unchanged on the workstation. I can able to ssh into the router from the workstation. I know I am nearly there; but don't know how to proceed.

Can anybody gues what it is and could probably help?

Thanks
Thursday, August 16 2012, 09:04 AM
Share this post:
Responses (10)
  • Accepted Answer

    destfinal
    destfinal
    Offline
    Monday, May 13 2013, 11:05 AM - #Permalink
    Resolved
    0 votes
    but I am really staying away from this if this core functionality is not fixed.


    Exactly what I have done. I cannot complaint much, anyway, as I am using ClearOS for free. May be I would take some time to write a script to prevent this happening. :(
    The reply is currently minimized Show
  • Accepted Answer

    Cristobal
    Cristobal
    Offline
    Tuesday, May 07 2013, 04:23 PM - #Permalink
    Resolved
    0 votes
    This is really sad.

    It has happened to me too, yet I am running ClearOS fully updated (ClearOS Community release 6.4.0 (final)) and it does the same stupid thing.

    I have 2 Wans, 1 Lan.
    Wans have been set up as pppoe, whenever 1 of the Wans go down the network looses all internet connectivity. However the router has internet connectivity (of course) and it does recover hen the Wans are restored.

    Routing table seems ok (thought Peter says that it is not really in use) as far as I know ip route should be in use every time, acording to table rules.

    Since one of the interfaces tends to reset itself pretty often, this is driving me crazy.

    So, in case you have been wondering, http://www.clearfoundation.com/component/option,com_kunena/Itemid,232/catid,19/func,view/id,44316/ " target="_blank">thisthis and the current thread are not resolved.

    Also, with this configuration if you try to start with a ppp device offline it will never gain connectivity for the office.
    I do not know if I'm paying 250usd to get this solved, but I am really staying away from this if this core functionality is not fixed.
    The reply is currently minimized Show
  • Accepted Answer

    Wednesday, December 19 2012, 07:18 PM - #Permalink
    Resolved
    0 votes
    destfinal wrote:
    For some reason ppoe-connect deletes the default route before trying to up ppp connection (WHY????). ... snip ...

    Is this expected/known behaviour - especially with ppp?

    Unpatched, yes. For a standards ClearOS system, nope! Do or did you have 3rd party repositories enabled? Was the clearos-test repository ever enabled? I'm just wondering if some of the standard packages have been overwritten. Regardless, if you have a support subscription to ClearOS Professional, please submit a support ticket and we'll get it sorted out.

    An another note, the default routing table when Multi-WAN is active is not really in use. Here's a technical document on Routing and Multi-WAN
    The reply is currently minimized Show
  • Accepted Answer

    John Brand
    John Brand
    Offline
    Wednesday, December 19 2012, 01:42 PM - #Permalink
    Resolved
    0 votes
    @destfinal did you already get the problem resolved?

    I'm having the same problem. And because my pppoe connection drop a couple times a day and reconnecting takes almost 15 minutes I want a solution for this.
    The reply is currently minimized Show
  • Accepted Answer

    destfinal
    destfinal
    Offline
    Thursday, August 23 2012, 12:25 PM - #Permalink
    Resolved
    0 votes
    That's fine Nick. I am working on this.
    The reply is currently minimized Show
  • Accepted Answer

    Thursday, August 23 2012, 12:07 PM - #Permalink
    Resolved
    0 votes
    Sorry, I can't help with the multi-wan aspects. I could only point you to a few files you could hook into.
    The reply is currently minimized Show
  • Accepted Answer

    destfinal
    destfinal
    Offline
    Thursday, August 23 2012, 11:43 AM - #Permalink
    Resolved
    0 votes
    Thanks for your reply Nick. By the time I got your reply, I had discovered syswatch. Thanks for your suggestions anyway.

    Anyway, it is really painful to debug it which cascades to different scripts.

    At last I, found that the problem is with ppoe-connect script, which gets called by adsl-start script which gets called by ifup-ppp which gets called by syswatch :).

    For some reason ppoe-connect deletes the default route before trying to up ppp connection (WHY????). A ppp connection attempt takes at least 2 minutes (especially when the link is not connected); during this period there is no default route and hence no connection to internet (I am talking only about the router here - when it comes to connected workstation(s) nat-ing becomes an issue as well). And therefore what happens here is, the internet (from the router alone) is up for a minute or so and then goes down for more than couple of minutes alternatively until I stop syswatch (which then defeats the multiwan). But with this, a failure on the secondary connection takes the primary connection off which is completely against the very purpose of having multiple connections.

    Is this expected/known behaviour - especially with ppp?

    Or

    What should I do to ensure that:
    [ol]
  • Only the primary connection be used for outgoing internet
  • The failover works seamlessly (or at least up to a minute delay) to the secondary connection when the primary link is down
  • Get the connection transferred back to the primary connection when it becomes available (again seamlessly)
  • [/ol]Please let me know if I have missed anything above or need more information in this regard.

    Thanks
The reply is currently minimized Show
  • Accepted Answer

    Wednesday, August 22 2012, 11:38 AM - #Permalink
    Resolved
    0 votes
    When a ppp link comes up it triggers /etc/ppp/ip-up.local. You could try adding a script to the end of the file. You could also hack the syswatch script. Lastly have a look at the /etc/dhclient-exit-hooks man pages as that fires with a few usable parameters each time a DHCP lease is changed/renewed. It may also fire with PPP connections.
    The reply is currently minimized Show
  • Accepted Answer

    destfinal
    destfinal
    Offline
    Wednesday, August 22 2012, 10:04 AM - #Permalink
    Resolved
    0 votes
    In addition, if the ppp link is down, the routing keeps playing so that the internet is down and up alternatively . This, I think, is a problem raised here

    http://tracker.clearfoundation.com/view.php?id=679

    Though the issue is claimed to have been solved, it still seems to be there.
    The reply is currently minimized Show
  • Accepted Answer

    destfinal
    destfinal
    Offline
    Wednesday, August 22 2012, 09:01 AM - #Permalink
    Resolved
    0 votes
    Hi,

    Recap: As I have mentioned earlier, with all the connections up and alive, no problem. As soon as I unplug the main internet the workstation looses internet connection; but connection with the router is fine.

    Meanwhile I have changed the secondary internet connection to a ppp0. After disconnecting the primary internet I checked the nat table. It was empty. Therefore I added nat rules as follows on the router:
    iptables -t nat -A POSTROUTING -o ppp0 -j MASQUERADE
    iptables -A FORWARD -i ppp0 -o eth2 -m state --state RELATED,ESTABLISHED -j ACCEPT
    iptables -A FORWARD -i eth2 -o ppp0 -j ACCEPT

    and now the workstation can connect to the internet.

    If the primary internet is made alive, it automatically overwrites the nat table, which is what I want. What I am wondering is some script is run when a link is down and some script is run when it comes back which adds and removes those nat rules. What scripts are they, so I can add the above to automate the failover. Please help.

    Thanks
    The reply is currently minimized Show
  • Your Reply