Polidea's RxAndroidBle and why you should use it

Bluetooth on Android: it sucks

When we first started working on SmartShepherd, Bluetooth LE seemed like a logical choice for tracking interactions between animals. It's shortcomings were major advantages when you don't need an absolute location: it can't transmit far and it can only transmit limited data in an advertisement. Once it was determined that BLE was going to do what was required for the algorithm I had in mind, the implementation became fairly straightforward

Testing is one thing, the real thing is much harder

Of course this all fell to pieces as soon as we did our first field test. I had never had any more than 10 or so devices active at once to prove the software did what I wanted, and the Android based app we used to assign devices to animals and to read the data afterwards seemed fairly straightforward in the lab. 10 devices is a lot different to 250 though. At around the 150 mark strange things started to happen with the Android app. It locked up, it stopped connecting to Bluetooth at all, the Bluetooth stack crashed and required a complete reboot of the Android device. From memory it took us most of a morning to assign that first group of sheep and I was rebooting the Android device every 5 SmartShepherd collars. The results came back OK but overall it seemed like a disaster in terms of being a useful product.

If we tried reading these collars back in 2017 we had to do it individually

Then, it got worse

There was some low hanging fruit that could be addressed with known issues on the Android bluetooth stack. For starters, on the version we were using you could have a maximum of 7 connections and the stack fell to bits if you tried to connect to #8, requiring a reboot. Worse than that, if one of the BLE devices decided to stop talking to Android, that dead connection was never cleaned up properly despite calling the disconnect routine. In this case, the completely stupid solution that worked was to turn off the bluetooth stack after every 5 successful connections. In fact, that code is still in the SmartShepherd app which sounds ridiculous but when you are at sheep 500 out of 1000 on a particularly big day, you don't mind the small wait every 5 animals.

One more serious issue that cropped up was having to put a small delay in between stopping scanning for devices and connecting to one. For some reason, attempting a connection too quickly after stopping scanning would fry the bluetooth stack. The delay is still there in our code too.

More seriously, I had an incredibly complicated pile of code involving message passing and bluetooth message queueing that had lots of dead man's switch logic in it if a particular message didn't get a response within an arbitrary time. It worked, but it was overly complex and incredibly difficult to maintain. There had to be a better way, so I lifted my head and went looking on the net. Much to my relief everybody else that was pushing the bounds of bluetooth LE on Android was suffering just as badly.

Enter RxAndroidBle

These guys really understood the pain everybody was having. To be fair, a lot of Android devices have some fairly sketchy bluetooth LE hardware in them and the integrated handheld we were using was no different. We needed something with an integrated animal RFID reader in it so compromises had to be made, but the bluetooth issues were killing our ability to scale. There was nothing for it, I ended up removing vast swathes of carefully calibrated code and just pushing RxAndroidBle into the app. So much better - based on the RxJava asynchronous classes I no longer had dead mans switches everywhere and the logic of the code made so much more sense, given that the success and error states are closely placed within lambda expressions nearby to each other.

Disposable scanSubscription = rxBleClient.scanBleDevices(
        new ScanSettings.Builder()
            // .setScanMode(ScanSettings.SCAN_MODE_LOW_LATENCY) // change if needed
            // .setCallbackType(ScanSettings.CALLBACK_TYPE_ALL_MATCHES) // change if needed
            .build()
        // add filters if needed
)
    .subscribe(
        scanResult -> {
            // Process scan result here.
        },
        throwable -> {
            // Handle an error here.
        }
    );

// When done, just dispose.
scanSubscription.dispose();
Of course in my app there is actually stuff in the error handlers.

It isn't a complete miracle, but it did make for a much smoother workflow for our customers. We were still hitting scaling issues but they started coming it at 500 devices rather than 150. Physically moving the assigned animals further away sorted a fair bit of it, but collecting the data was still best done by individually reading collars. Now that we could do bigger mobs, that became a much bigger problem which leads to...

If you're serious, just buy a Cassia Networks X1000 and be done with it

There is nothing too much wrong with BLE if you broadly stick to not having 2000 devices in close proximity switched on (all of the BLE devices will simply lose their tiny minds just for reference). In the end, for large volume deployments any Android device is simply not going to be adequate. The X1000 handles those kinds of situations with ease which is why we have moved forward, away from BLE use directly on Android and now outsource everything to a Cassia Networks router. The router doesn't seem to care if there are a thousand devices nearby, it still manages to connect to them and read data. We have now changed our collection routine completely. We used to try to switch as many of the devices off as we could remotely, with a combination of direct connection and advertising a poison packet, but with the X1000 we just start reading. It starts off slow if there is a lot of traffic but it can get through 500 collars in 45 minutes which is about 4 times faster than the way we used to do it. Leaves time for a cup of tea and a chat with the client about their breeding program. It also means we can collect collars in the morning, read them, and redeploy them straight afterwards which means one less trip to the farm.

BLE on Android is still useful...sometimes

I have had the odd situation where the client had a tricky setup for their sheep which makes using the Cassia for assignment purposes too cumbersome. So for that we still assign using RxAndroidBLE but I never read the devices without the X1000. We ended up writing a wrapper class that doesn't care whether the device is BLE or comes from the Cassia Networks router, it's handled in a way that the interface just converses with our API and the routing is taken care of at a lower level in the app, so the user never sees any difference. There is simply a setting on the app for "use cassia" that can be enabled or disabled. Since the Restful calls that Cassia Networks use can also be asynchronous, the logic turns out to be very similar so it was easier to implement than I thought.

Which leads me to this: If you are doing BLE on Android, just use RxAndroidBle instead of writing your own bodges / work arounds for the limitations of BLE on Android. If you are dealing with a lot of devices, buy an X1000. You can thank me later.

Popular posts from this blog

Tailscale ate my network (and I love it)

That magical word: Workstation

Time machine: Solaris 2.6 on QEMU