I used to love browsing in record shops. Sometimes you’d stumble across music, that you would never ordinarily think of buying. Record shops are still with us of course (vinyl in particular is experiencing a renaissance), however, more broadly the way music is delivered to consumers has changes. Today, we don’t seem to own “music” anymore. Instead we stream music stored elsewhere.
What does this have to do with developing trading strategies? Quite a lot! In order to develop and run systematic trading strategies, our “music” alas isn’t Lennon and McCartney, but market data and we need to be able to analyse it. The traditional way has been to collect data from external sources and store them locally and do all the processing locally on your own servers. Now of course, with cloud based services such as Amazon Web Services, Google Cloud and Microsoft Azure, in practice, we can do everything remotely. We can store our data on the cloud and also process it there. Locally, we don’t need any servers, and instead can rely upon our desktop machines to remote into the cloud. OK, admittedly, none of this is really “news”. I use cloud based services for some of my work, and I’m sure many readers do as well.
However, it’s worth trying to thing about what questions we need to ask when choosing to move what we do on to the cloud and what we might wish to keep locally. This isn’t designed to be a detailed look at measuring the cost of cloud based computation – just a few observations I’ve made whilst using the cloud recently, for some of my computation work. I’m not going to attempt to compare the cloud providers (mainly, because I haven’t actually been a customer of all of them to properly be able to tell the differences and am certainly not an expert on the subject).
Which services should I use from cloud providers?
There are many types of services offered by cloud providers. You can use services such as EC2 on AWS, which literally give you a box on the cloud with an operating system of your choice such as Linux. It’s then up to you to manage that box, install whatever you want on it. In other words, the cloud provider manages the hardware, it’s basically up to you to manage the software bit. Then there are higher level services, such as Amazon’s RedShift, which is a ready to use instance of a database server. If a higher level service is very specific to a cloud provider, it might take more time if you try to move to another cloud provider (or attempt to work locally). You need to think about whether the time saved by using a higher level service offered by a cloud provider is worth it in the long term. This also applies to any services you use, by third parties, such as data providers (how costly is it to switch, or is the dataset so unique to them, that they are only source of it?).
Is using the cloud cheaper for computations?
This obviously depends on what you are doing. The great thing about cloud based services, is that you can power up and power down servers when you want. You can rapidly spin up servers when you need to do computation quickly. Hence, you don’t need to pay for processing time when you’re not using it (obviously, for stuff like storage, we are likely to need persistence, so we might have to pay for that continually). However, if you are using massive amounts of storage and need continually high amounts of CPU time, it could end up being very costly doing everything on the cloud. Many cloud providers also help startups by offering a certain number of free credits, when they first start using their services, which should help at least at the beginning
The cloud is always on (or nearly always!)
There is of course a convenience aspect from outsourcing stuff to the cloud, not needing to worry about replacing faulty hardware, having to worry about occasional power cuts (or interruptions to internet connections). Yes, it won’t be 100% on, but I suspect it’ll be higher than any proportion I can achieve locally! For me I tend to use the cloud for those cases where I want services to be “always on” such as web hosting, creating daily data indices for clients to download etc. On the flip side, for “offline” operations, such as research work, I tend to use my local powerful workstation, which has a decent number of cores, memory and fast hard drives. Whilst, there was an initial capital outlay for purchasing it, if I keep it for long enough it should be cheaper than the cloud.
This isn’t really an exhaustive list of questions for deciding when/if/how to use the cloud, but hopefully, it’ll be enough to get you started! On balance, I’ve found using cloud based computation a nice easy way to get results. However, it isn’t a solution for absolutely everything. It’s important to keep track of costs when using cloud based services, as it’s not always cheaper than doing everything locally.