Wednesday, May 23, 2012

Fixing BAR handling and handle corner cases of things..

Part of the general 802.11n requirements is correctly handling TX aggregation and failures - where we need to pause the TX queue, send a BAR to forcibly move the remote end block-ack window to be in sync with the transmitters idea, and then continue sending frames.

This exposed a very annoying problem - what if the driver runs out of ath_buf entries to schedule TX frames? Or, what if the network stack runs out of mbufs? If we need to allocate an ath_buf/mbuf to send a BAR frame, but they're all allocated and unavailable, the driver/wireless stack will come to a grinding halt. Typically these allocated ath_buf's are allocated in the software queue, waiting for the BAR TX (or power-save wakeup) to send a frame.

So, I haven't fixed this. It's on my (very) short term to-do list. But it did expose some issues in how the net80211 BAR send code (ieee80211_send_bar()) works. In short - it didn't handle resource allocation failures at all. It worked fine if the driver send method (ic->ic_raw_xmit()) succeeded and just failed to TX the frame. But if it couldn't allocate an mbuf, or if the driver send method failed.. things just stopped. And when the BAR TX just stopped, the ath(4) software TX queue would just keep buffering frames, right until all the TX ath_buf entries were consumed.

This is obviously .. sub-optimal.

But this raises an interesting point - how much of your kernel and/or userland application handle resource shortages correctly? I've seen plenty of userland software just not check whether malloc() returned NULL and I've seen some that specifically terminate (non-gracefully) if malloc()/calloc() fails - Squid does this. But what about your network stack? How's it handle mbuf shortages? What about the driver stack? What about net80211 (ew) ? What if the kernel malloc() API has to sleep because there's no free memory available?

I don't (currently) have an answer - it's a difficult, cross-discipline problem. What I -can- do though (at least in my corner of the FreeBSD world - net80211 and ath(4)) is to start testing some of these corner cases, where I  force some resource shortages and ensure that the wireless stack and driver(s) recover somewhat gracefully. 802.11n is very unforgiving if you start dropping frames involved in an active aggregation session. So it's best I try and address these sooner rather than later.

FreeBSD/arm -

I've just found out that FreeBSD/armv6 now runs on this:

http://www.embeddedartists.com/products/kits/lpc3250_kit.php


This is excellent news. There's also been some recent work to improve pandaboard support (TI-OMAP) focusing on pmap and SMP fixes.

I'm so very glad that the FreeBSD community is pushing this ARM project along. Yes, the armv6 branch is not very well named (it supports more recent ARM platforms, not just armv6 improvements) and I'm hoping this will all make it into FreeBSD-HEAD soon.