Today Was Fun
At around 1650CDT, I started dropping ~20% of my WAN packets from my Buffalo router+WAP. By 1700, the WLAN was broadcasting but refusing to switch traffic. The wife was en rt with a massive speech to finish & practice for tomorrow so I had little time to fix myself.
Because I was less-sure about my Ruckus ZoneFlex 7372's config and stability, I first pulled her a cat6 drop from a 2nd floor switch, down the hall to her office. Then I moved to shore up my Ansible playbooks to move from a 6-VLAN config, which I had been planning to cut over to this weekend at the latest, to a single VLAN to simply restore the status quo. Basically exactly what I had before my Buffalo drank the wrong Koolaid.
What Resulted Tonight?
- Cable modem physically moved in the house from the 2nd floor to the basement
- Cable modem downlink moved from dd-wrt Buffalo, to Debian Linux router
- Buffalo demoted from DHCP & DNS so it's mostly an unmanaged switch
- Ruckus AP moved to the basement & directly connected to the new router
- Important WLAN stations moved to temporary SSID & WPA PSK
What Did I Learn?
-
Ansible saved my life
I had spent a lot of time working on a role to configure bridges, VLANs, raw ifaces, and addressing. That shit paid better dividends than Enron downing power plants
-
My network needs more HA
- Hot-spare or load-balanced ISP uplink
- Hot-spare or load-balanced DOCSIS 3 modem on-hand
- At least 2x UPS (workstation + infrastructure, maybe 4x for workstation and infrastructure idempotency?)
- Hot-spare workstation (or at least enough equipment to rebuild what I currently have)
These are all pipedreams at this point but the maintenance cycle to upgrade my screens and now this event have shown me that I'm woefully unprepared.
For a guy that gets paid to assure service responsiveness & uptime, I'm clearly doing a shitty job of it at home.
And that could seriously fuck me...
-
I have the best wife in the world
Even with a critical deadline on the hook, she fully appreciated my diagnosis of a failing router & suggestion that we migrate immediately. We worked cloesly to coordinate my downtime window that was to be required and everything went well. I think a lot of IT spouses could learn something from this incident.