Data pools sound so glamorous, don’t they? Their name evokes fantasies of frozen fruit cocktails, whiffs of coconut sunscreen and inflatable beach toys. Yet anyone who has ever tried to understand or work with a data pool knows that they are, well… not that. Just reading the packaging they sound awesome: data pools all have a variety of different business and service models, but the promise is generally the same. You’re a manufacturer? “Upload your product content in one spot and thousands of trade partners will seamlessly pull it from there.” You’re a retailer? “Download all the product information you’ll ever need from one convenient location.”
By nature data pools are a two-sided platform. I’ll refer to those two sides generically as publishers and subscribers. (Think: those that are throwing the party, and those that just showed up for sunshine, margaritas and guacamole.) We kinda have to differentiate between those two perspectives if we wish to understand the benefits and challenges.
Publishers
Publishers contribute product content to the platform. Examples of publishers include manufacturers or wholesale distributors who are trying to make product information broadly available. The name of the game for publishers is scale. They want to reach the broadest audience possible with the least amount of effort.
Large, reputable data pools with a broad subscriber base can actually help achieve this objective. When syndicating content, all brand manufacturers start with a priority list. They work one-on-one with their tier-1 customers who can move significant volume (think Amazon, Walmart, The Home Depot). Next, they address as many tier-2 customers as they can. Often tier-2 customers are a leader in that brand’s specific vertical (think multi-state liquor stores for a beer manufacturer). Sometimes tier-2 customers represent higher margins than tier-1 customers, but they can’t do the volume. And sometimes tier-2 customers a must-have from the standpoint of brand resonance, but a poor performer in terms of strict revenue. When it comes to tier-3 customers, brands and distributors have to provide a one-size-fits-all solution, and quite frequently the brand runs out of hours in the day before they can be prioritized at all.
For brands, data pools have a multiplying effect. Some tier-1 and tier-2 retailers may choose to pull generic content from the data pool (more on why below), allowing the brand to focus on customizing the product attributes that matter: hero images, videos, product titles, descriptions, feature benefit statements, enhanced content. The product attributes that drive discoverability and conversion. But most of all, data pools with a large following are a great way to reach tier-3 retailers for whom the brand doesn’t have time and resources.
Considerations When Selecting a Data Pool
There are no free lunches, and while data pools may solve a short term problem they may also create long term problems for your organization. So here are a few things to watch out for as a publisher (brand or distributor) when selecting a data pool.
- How does the data pool make money
- Do both publishers and subscribers have to pay
- Intellectual property and distribution rights
- Who has access to what product data
- Who has access to what analytics data
- Control over content and ability to refresh and change
- Ability to view content requirements for any given retailer
- Methods subscribers use to connect
- Quality of content and service that subscribers receive
- Ability to report on proof of performance
Subscribers
Subscribers download product content from the platform. Typically they use that downloaded content to sell those products. Examples of subscribers include retailers and ecommerce platforms such as Instacart, Mercato, Freshop and Cornershop.
Data pools help address a number of problems for a subscriber. First and foremost, small subscribers understand that they don’t register high enough on a manufacturers priority list to work with them individually. Think small to mid-sized retailers and ecommerce technology startups. For these subscribers, a data pool levels the playing field and provides off-the-shelf brand-approved content they can draw from.
Second, subscribers often don’t have staff and infrastructure to work with manufacturers even if they were willing to. They lack the systems, workflows and quality control to manage the inbound flow of data. It takes an incredible amount of resources to manage inbound content when it is flowing asynchronously from a multitude of different sources. You can literally count the number of retailers globally with this level of sophistication on your fingers and toes. Everyone else really needs to rely on a third party to help guide their suppliers through the process of providing content, and synthesize the flow of data into one consistent predictable flow of information.
Finally, subscribers (both retailers and technology platforms) struggle to manage their taxonomy for two reasons. First, the retailer is the party in the supply chain best equipped to understand their consumers. But even in 2020 (The Jetsons was a brutal lie) every organization is in some stage of digital transformation. The most digitally adept organizations don’t think of “digital” or “ecommerce” as a department, they view it as the economic fabric on which the future of their business depends. These organizations have evolved to become digital by design, and the intersection of consumer (who they know best) and product (who the manufacturer presumably knows best) is now the responsibility of a category team who has a number to hit every quarter. As part of their job they’re solving questions like: What do consumers most care about and how do they discover and shop for office chairs? cannabis? leotards?
But companies that are still evolving rely on a digital or ecommerce department to figure all of that out. And depending on how many products they sell, the categories they cross and the depth of attribution required in their industry, they’re drowning.
The second reason taxonomy is a gnarly problem involves the retailer’s interaction with the manufacturer. Bear in mind that these two parters only start to make money the day a product goes live. And it only makes sense for a manufacturer to create data when they know someone will use that data to sell it. So the manufacturer doesn’t have much of the information a retailer is asking for. The more data the retailer requires, the longer it takes to bring that product to market. And in the worst of cases, the manufacturer throws up its hands and says, “you know what? Screw it. Why am I wasting my time with you when I could be ringing the register with the guy up the street?” Understanding that balance is a challenge and requires in-house expertise.
What does a retailer do to solve for any of these three issues? They subscribe to external data pools who can manage this all for them. Problem solved!
Considerations When Selecting a Data Pool
Yeah if it was that simple you wouldn’t still be reading, would you? From a subscriber’s perspective, here are a number of things you should be thinking about before you decide to hop in bed with a data pool provider.
- The breadth of the data pool’s assortment, and how they obtain products and information they don’t have (if at all)
- How dependent will I be on a single provider
- Will this provider be the sole source of product content to my organization
- Do both publishers and subscribers have to pay
- Where did the content come from, and has it been approved by the brand
- How sensitive is the data and what happens when errors introduce liability
- Usage rights – how can you use the data, what rights do you have to that data in the future
- Who has the ability to view my activity
- The extent to which you can influence the taxonomy as your business matures
What about GDSN data pools?
Oh man. I’m not going to open the GDSN can or worms in this single post. The GDSN is a complex political ecosystem that will make a great topic for another day (or series of days). But here are some basic things to know about the GDSN and its 44 global data pools.
Yes, GDSN data pools are data pools. They function similarly to what I described above, and they too are platform businesses that enable manufacturers to send product content to their trade partners. With that said, there are three elements that make the GDSN special: governance, some rules of engagement and a standard data model.
Independent Governance
The GDSN is governed by a global body called GS1, headquartered in Belgium. GS1 has developed a pantload of global communication standards like GDSN. The standard you’re probably most familiar with is the barcode.
Among other things, GS1 manages the internet-based, interconnected network of 44 global, interoperable data pools called GDSN. It governs the membership, makes the rules, approves the data model. There are special handshakes, little green sashes with badges on them. It’s all very official.
Protocol and Rules of Engagement
Each of the GDSN data pools communicate with one another using a specific protocol, and they are required (as members of the network) to seamlessly share data with one another.
This means that philosophically, contracting with any one data pool should allow you to seamlessly send product information around the world to any subscriber of any other data pool – just like dialing up your buddy’s telephone number in Sydney.
Standardized Data Model
The GDSN implements a global data standard. This is a standard set of attributes that should be collected to represent specific categories of product. In practice the data standard has its nuances and challenges that I will address another day, but in theory it makes data being passed between organizations more predictable, and what works for one retailer should hold for the next retailer.
The strength of GDSN lies in its ubiquity. While it can be difficult to learn, adopt and understand, sending data via GDSN has become a de facto standard for item setup at most large retailers around the globe. Granted, as pressures of digital have increased, the divide between item setup (which happens across the GDSN) and providing rich, digital content (which happens in other ways) has widened, creating two independent, parallel workflows at a retailer. But as a tool to initially deploy operational item content, GDSN remains strong.