Semantics3 builds AI-powered solutions that categorize huge volumes of products with better-than-human accuracy. I sat down with Varun Sivamani to chat about how they’re building their unique business and technology.
Can you introduce yourself and tell us a bit about Semantics3?
My name is Varun and I’m the CEO and co-founder of Semantics3. I started the company with my classmates Vinoth and Govind soon after we graduated from college in Singapore (NUS). I’m a hardware engineer by training, with a focus on embedded systems, but I caught the software bug in my second year of university. I was interested in the problem of extracting structured data out of unstructured documents — so much so that I made it the topic of my final year thesis. A heavy dose of Hacker News and PG’s essays convinced me I should channel this enthusiasm into starting a company.
At Semantics3, we’ve set out to organize the world’s ecommerce data — tracking product and pricing data across the web and making it highly structured. We provide ecommerce data solutions for marketplaces, brands and logistics companies via a suite of APIs. Our product suite also includes AI-based APIs such as Classification and Named Entity Recognition.
Use cases that we support include categorizing your products to a standard taxonomy, figuring out how your competitors are doing, seeing how your products are doing on various marketplaces, helping with tariff classifications, etc.
Can you say more about the data landscape involved?
Across manufacturers, there isn’t high quality structured product and attribute data consistently available. Consumers can struggle to find and buy the products they are looking for as a result.
Typically an online seller provides the minimum amount of information to get their product listed, so he uploads a title, image, and blob of text containing a description. But the attributes consumers care about and want to use to make decisions are not easily accessible in this format. Even if sellers want to be more helpful, they are often limited in the data they have access to by decisions made earlier in the supply chain by distributors.
Take the simple example of a buying a laptop. You’d typically want to know how much memory and disk space it has — these data points influence which model you want to buy and also help you form a baseline for comparison across products. Our product helps extract this information out of free text so that it’s easily accessible to users.
Why is AI uniquely useful for the use cases that you support?
We got started before AI entered the everyday vernacular. Back then it used to be called ML! And back then we called ourselves a Big Data company too…funny how that’s no longer in vogue.
When we started, the state of the art in AI was simply not good enough to tackle the problems. For a long time our motto was “more data beats fancy algorithms” and I personally was quite anti-ML. From a technical point-of-view, there was only so much feature engineering work one could do and only so many heuristics one could encode.
However, given the vast amounts and diverse types of data we are ingesting, we clearly needed the right technology for it and it turned out neural networks are the way to go, at least as of 2018.
Can we get a level down on the key use cases?
Sure — all of them come from same foundation — structuring previously unstructured product data.
Let’s take categorization. It’s the single most important step in any ecommerce data pipeline. This is one of those things where inaccuracy could mean a lot of lost sales. For marketplaces that deal with millions of products across thousands of categories, trying to tackle this problem at scale becomes an overwhelming problem.
To have a successful marketplace, both suppliers and buyers need structured product data, but as a result of the issues with relying on suppliers to upload feeds, there are no universal canonical data sources and errors propagate across marketplaces. There’s also a scale problem inherent in the business model — marketplaces are winner take all, and so anyone operating one needs to hit scale as quickly as possible. You need to balance a good buying experience with the ability to onboard new sellers as quickly as possible.
This is an area where we’ve now built sufficient expertise. And it turns out having highly scalable and accurate categorization technology enables companies to tackle related strategic problems — for example we now have customers in the logistics space such as Aeropost who are using our categorization technology to help augment their tariff classification.
Tariff classifications are the process of determining the correct tariff code for imported and exported goods — it’s unbelievable that people still do this manually. No business wants to underpay or overpay their taxes.
The costs of tariff compliance errors run into billions of dollars globally! We’re replacing the conventional method of using boatloads of people to do this. This is probably one of the most boring problems out there that AI is solving, but when you think about the impact it’s really clear that it’s necessary.
What was the most like unexpected part of this journey so far when building on these technologies?
How valuable the training data sets are and how critical it is to invest heavily in them.
It’s a challenge to come up with things that are independent and don’t have any statistical biases. And sometimes we need to rely on input from customers where there is high complexity. I like to joke that the value of our company is not in our algorithms, products, or team, but in the training data sets.
Fundamentally we’re dealing with the challenge of a taxonomy has 12,000 nodes but is very unbalanced — some categories have a million products, others only have 100. New product entries can be challenging as well . When Oculus Rift first came out, we obviously didn’t have a VR headset category, so we had to determine where it fit in.
The other thing is that although we had envisioned being a data company, the internal tech that powers the data has been as valuable to the customers as the data itself. Usually our customers are curious about what’s happening behind the scenes and want to know if it can be consumed as a service. As a result we’ve started exposing some of our AI capabilities as APIs directly to our customers. So another key learning has been that when building new systems you always want to architect the underlying component such that it can ultimately be able to become the product itself.
A lot of people are concerned that increasing adoption of AI is going to negatively impact employment. What’s your view?
This is something that has been on my mind of late. Technology is supposed to bring in efficiencies and that means sometimes humans gets replaced with machines. This has been happening since the Industrial Revolution and humanity seems to be doing fine. We somehow adapt ourselves.
It’s just the sheer pace at which machine learning/AI has been progressing that feels different. It’s been tiring keeping up with the updates as a practitioner even.
The other thing that has been quite fascinating to me is the advances in robotics. Once robots attain sufficient dexterity to do picking, dropping, and slotting tasks fairly well, and as the economics become cheaper, they could render a lot of factory workers jobless. But taking a historical view, back in the day the steam machine and the spinning wheel also made a lot of people jobless. So in some sense, it’s just how things have always been. There’s also the second order effects — such as better product data availability leading to fewer returns, which reduces warehouse/logistics work significantly.
Another dimension is that historically it’s been the blue collar workers who’ve been most affected by new technology. This time around the white collar workers could be at risk. Something that’s become quite clear to us is that a lot of professional jobs are just classification and a lot of professional qualifications just teach you how to classify things correctly (for example, customs brokers, radiologists). If machines can do these tasks much more accurately and cheaply at scale, why wouldn’t we go that route?
In the interim, I think AI is going to augment humans and make them better. In fact we’re already seeing this in our work. Our logistics customers that use our Categorization APIs for HS Code classification are enabling their customs broker do work at 10x the speed they otherwise could. My guess is that this is what will happen with the earlier example of medical professionals too.
Either way I have faith in humanity — we’ll figure out something. There may be some short term pain, but I think we’ll be fine.
And for what it’s worth — I’m quite skeptical that consciousness, sentience, awareness, things associated with the I in AI — are going to arise out of tensorflow and a GPU.
I agree with you on that last point. As a founder, are there things you’d want to consume powered by AI? Do you have a call for startups?
Thinking as a consumer, it would be great if someone built Shazam for humming! On the business side — I’d like to see better conversational APIs for domain specific tasks. Google Duplex is a perfect example.
As a founder, I’d also love to have software that can fully run with SDR type work. I’m picturing a bot that can be CCed in and follow through on conversations, keeping track of proposed timelines and making sure nothing slips through the cracks. SDR work is in the strange category of low value but high impact…SDRs may be out of a job but startups will grow faster!
Faster and more capital efficiently! Anything else we should know about Semantics3?
We tell our customers to think statistically rather than viewing AI as a deterministic magical system that will solve all their problems. Real world user input / user queries make things fall apart quickly — think Murphy’s law, that’s what they’ll pick.
At the end of the day, you have to do a ton of feature engineering and write a bunch of heuristics. Any company using AI needs to be willing to roll up your sleeves and do this and if you’re not willing to your customer will fail. We don’t want to prioritize technology over our users’ success.