The field of alternative data is growing at a rapid rate. One of the most established forms of alternative data is that of machine readable news. The amount of news available is expanding very quickly. By making it machine readable, it is possible to digest much more news than would be possible for a human. Some of the largest vendors in machine readable news space include Bloomberg, Thomson Reuters and RavenPack. The general idea is that they take in unstructured data and then structure it into a format that is easier to digest for the end user, who will typically be a trader. Whilst in the past machine readable news largely focused on newswires, in recent years they have also started to ingest news from other sources notably the web. Over the years I’ve looked at many different machine readable datasets and it’s an area which I know quite well. I was commissioned by Ravenpack to write several research papers on their macro dataset, and I was lucky enough to present some of this work both at the Federal Reserve Board in Washington DC and the Bank of England. These days it is not purely quant traders who use machine readable news, I’ve also seen an uptick of interest from discretionary traders.
There are also many startups in the area of machine readable news appearing, which each have their own unique ways of looking at the market. There’s Prattle, which ingests central bank communications to seek to quantify what they are saying and at the same time, they also have economists who give market commentary based on their dataset. I’ve had the opportunity in the past of working with Prattle. More recently, I’ve been doing some work with Cytora, a Shoreditch based startup, which also seeks to gain insights using web data, and in particular structuring this information into easy to use time series. This is by no means an exhaustive list of the vendors who process news and web data into structured datasets, just merely the vendors that I am familiar with in news data space.
Each dataset I’ve described is very different. Not only is the input data different, but the methodology around how they do sentiment analysis varies. In addition the way each structured dataset is presented varies between the vendors, necessitating a different approach when it comes to generating signals from each dataset. For the end user, these differences actually bring advantages, even if it will take a bit more time to understand each dataset. You can combine the signals generated from each dataset to gain different insights into the market, that you may not have obtained if you had simply stuck to a single dataset. The real skill of a quant trader is not so much being able to extract the value purely from a single dataset in isolation, but is instead in understanding the linkages between them. Indeed, this is very true of my own “home” market of currencies, which can very often be seen as a byproduct of other markets. Understanding currencies is not simply a matter of analysing FX price data, but in identifying the linkages between FX and other markets, with capital flows, economic fundamentals, market microstructure, re-balancing of foreign holdings etc. each playing their part in driving the market.
One of the things that I will be doing at Cuemacro is to develop data products to help inform trading decisions, which are specifically designed to complement existing datasets, creating additional value, rather than to replace existing datasets you might already use. My goal is to have a relatively small number of very exciting specialised datasets, where I can add specific value using over a decade of my experience in currency markets and extensive experience in developing indices at both Lehman Brothers and Nomura. I have already developed two very exciting datasets for traders of macro based assets.
My CTA index maps the expected positions of a typical CTA fund, giving other investors an insight into what CTAs are thinking. I have also created my own take on understanding Federal Reserve communications from an automated viewpoint, creating my own proprietary index, which seeks to model the impact of Federal Reserve communications through the use of natural language processing and statistical analysis. My index has a statistically significant relationship with UST 10 year yields, which suggests it has considerable value for those trading in the market. Both my CTA and Fed indices are already available automatically updated on a daily basis for subscribers on Cuemacro’s website! Furthermore, many more indices will be coming in 2017 as well!
If you’re interested in hearing more about Cuemacro datasets and some of my future plans please feel free to drop me a message. Merry Christmas and Happy New Year, best of luck for 2017 too.. let’s hope the photo above proves correct!