Saturday, June 18, 2016

Debugging TDMA on the AR9380

So, it turns out that TDMA didn't work on the AR9380. I started digging into it a bit more with AR9380's in 5GHz mode and found that indeed no, it was just transmitting whenever the heck it wanted to.

The first thing I looked at was the transmit packet timing. Yes, they were going out at arbitrary times, rather than after the beacon. So I dug into the AR9380 HAL code and found the TX queue setup code just didn't know how to setup arbitrary TX queues to be beacon-gated. The CABQ does this by default, and the HAL just hard-codes that for the CAB queue, but it wasn't generic for all queues. So, I fixed that and tried again. Now, packets were exchanged, but I couldn't get more than around 1mbit of transmit throughput. The packets were correctly being beacon gated, but they were going out at very long intervals (one every 25ms or so.)

After a whole lot of digging and asking around, I found out what's going on. It turns out that the new TX DMA engine in the AR9380 treats queue gating slightly different versus previous chips. In previous chips you would see it transmit whatever it could, and then be gated until the next time it could transmit. As long as you kept poking the AR_TXE bit to re-start queue DMA it would indeed continue along transmitting whenver it could. But, the AR9380 TX DMA FIFO works differently.

Each queue has 8 TX FIFO descriptors, which can contain a list of frames or a single frame. For the CABQ I just added the whole list of frames in one hit and that works fine. But for the normal data paths it would push one frame into a TX DMA FIFO slot. If it's an A-MPDU aggregate then yes, it'd be a whole list of frames, but still a single PPDU. But for non-aggregate traffic it'd push a single frame in.

With this in mind, the TX DMA gating now works on FIFO slots, not just descriptor lists. That is, if you have the queue setup to gate on something (say a timer firing, like the beacon timer) then that un-gating is for a single FIFO slot only. If that FIFO slot has one PPDU in it then indeed it'll only burst out a single frame and then the rest of the channel burst time is ignored. It won't go to the next FIFO slot until the burst time expires and the queue is re-gated again. This is why I was only seeing one frame every 25ms - that's the beacon interval for two devices in a TDMA setup. It didn't matter that the queue had more data available - it ran out of data servicing a single TX FIFO slot and that was that.

So I did some local hacks to push more data into each TX FIFO slot. When I buffered things and only leaked out 32 frames at a time (which is roughly the whole slot time worth of large frames) then it indeed behaved roughly at the expected throughput. But there are bugs and it broke non-TDMA traffic. I won't commit it all to FreeBSD-HEAD until I figure out what's going on

There's also something else I noticed - there was some situation where it would push in a new frame and that would cause the next frame to go out immediately. I think it's actually just scheduling for the next gated burst (ie, it isn't doing multiple frames in a single burst window, but one every beacon interval) but I need to dig into it a bit more to see what's going on.

In any case, I'm getting closer to working TDMA on the AR9380 and later chips.

Oh, and it turns out that TDMA mode doesn't add some of the IEs to the beacon announcements - notably, no atheros fast-frames announcement. This means A-MSDUs or fast-frames aren't sent. I was hoping to leverage A-MSDU aggregation in its present state to improve things, even if it's just two frames at a time. Hopefully that'd double the throughput - I'm currently seeing 30mbit TX and 30mbit RX without it, so hopefully 60mbit with it.)