The Graph Indexer Experience
Summary
Based on the research collected here, I created a framework for an Indexer Operational Dashboard. Research included going into The Graph's Indexer channel, reviewing the Discords of Indexers, reviewing Indexer Office Hours, and talking with individual Indexers.
The infrastructure and data flows is complex:
To zoom out in terms of the business flywheel:
However, to more fully understand the opportunities and challenges, I widen the lens which opens up questions about a broader potential set of considerations for Indexers.
Market Dynamics
The Graphs's work token $GRT value rises as the amount of queries increases.
As more valuable data from more Networks becomes available, more dApps and more users of those dApps drives the demand for queries. This demand applies an upward pressure on the value of $GRT.
Indexers contribute to the ecosystem by serving data against queries run on their infrastructure. So while they do not directly drive demand, Indexers impact the search experience through the accuracy and performance of their services.
While this document looks at the problems and goals of the Indexers, we need to do so in light of the overall value to the ecosystem and the token.
Personas
Core Persona - Professional Node Operator
Goals are to:
- scale servers to
- deliver query quality with
- the lowest latency and
- highest stability while
- reducing costs in order to
- generate the highest returns so they can
- attract and retain delegators to increase their stake on subgraphs to
- further increase their profits
Many are also values-aligned with the principles of decentralization.
However, it appears that Indexing is a profession, and they often have a portfolio of Networks they support.
Possible Adjacent Personae
Are there other personas we need to explore?
Possible reasons:
- Other personas could drive demand for queries
- Developers of Subgraphs may want a way to also become Indexers, but not full-time
- Developers of dApps may want to also become Indexers if doing so benefits their dApps
I haven't encountered any that fit these alternative personas.
However, keeping in mind the broader ecosystem, with a focus on growing the ecosystem and its value, these should be explored.
As different tech-driven markets have evolved, different personas have emerged:
- "Easy-button" persona that wants to enter a market but with a much lower technical barrier in order to trade off performance. Examples: CloudFlare prosumer, consumer plug n play miners, casual gaming
- "Super seller" persona often is a power law player that is good at marketing: eBay super sellers, Uber fleet managers, e-commerce aggregators
These are just possible explorations of other adjacent personas who could also be Indexers.
For the rest of this document, I focus on the Core Persona.
Primary Jobs to Be Done (JTBD)
Indexers' range of tasks and responsibilities fall into two large buckets: Infrastructure Operations (DevOps) and Financial Operations (RevOps).
Infrastructure Operations (DevOps)
Infrastructure operations refers to the "caring and feeding" -- the DevOps -- of the nodes. Professional node operators are experts in this area.
Unlike traditional DevOps who support a continuous deployment of code bases and code fixes, currently Indexers manage infrastructure for their Subgraphs' query end-points. While that end point is hit by different applications, their service doesn't support new applications being deployed.
To service the end-point for queries and indexing of networks, they care about:
- scaling servers
- delivering quality
- reducing latency
- increasing stability
problems they face
- infrastructure fails or has performance issues
- set ups are complex and time-consuming
- monitoring and troubleshooting is complex
drill downs
how do I determine the architecture?
- I want both operating guidance and planning insights on whether to run one really heavy node, or spread indexing infrastructure out across the world.
- I want to make the right decisions on the hardware and cloud vendors
how do I configure resources ?
- I want to know how much throughout or other resources to allocate per service (eg graphql query, indexing)
how do I deploy easily?
- I want to quickly deploy from local, testnet, to mainnet
- I want to scale up and deploy to the right infrastructure
how do I monitor and troubleshoot my infrastructure?
A core part of the devops is being alerted and having the right telemetry. Current tools present some limitations:
- I want to limit or filter actions from CLI to reduce noise and unnecessary paginations
- I want to receive relevant alerts to fix priority issues
- I want to easily use tools like Prometheus and Grafana with minimal set up
- I want a usable and flexible visual dashboard to run my indexing operations (allocations, rules, network)
how do I monitor the subgraphs I am indexing?
- I want to know what a subgraph has been deprecated to reallocate
how do I scale my infrastructure?
- As data increases, I want to scale with ingestion volume.
- I want to shard my data without adversely affecting performance.
Financial Operations (FinOps)
The operators run their Indexers as a business.
As such, they have the same considerations: maximize and grow profit.
In order to do this, they want to:
- reduce costs
- optimize allocations
- maximize rewards
- attract and retain delegators
problems they face
- optimizing allocations is confusing or manual or opaque
- it takes time to market to delegators
- operating infrastructure is costly and sometimes not covered by rewards
drill downs
how do I manage costs?
- When I establish or modify my infrastructure, how do I ensure I spend the least to get the maximum output?
- Time is part of my cost: how do I streamline Infrastructure Operations to reduce time spent reacting?
how can I better define the cost models?
- When I set up the cost models, how do I decide what model sense for me?
- I want to know the performance of a given cost model
how can I continue to monitor and improve cost models?
- Once I have established models, how do I know whether I need to make adjustments based on demand and resource utilization?
how can I attract and retain delegators?
- How can I provide compelling customer service?
- How can I differentiate myself to appeal to delegators?
how can I optimize my indexing decisions?
- I want to have control over these decisions
- I recognize the complexity and that there are AI tools for this, but don't know how much to trust a black box
- This needs to be an area of differentiation
- I worry that the indexer rewards will dry up
Product Metrics
If overseeing the success of the enabling products, what do we need to impact, how do we measure (in a decentralized environment), and what is a set of activities and strategies?
Tools Adoption
- Adoption of tools
- NPS-like score of the tools or experience
- Frequency of customization (vs it just works)
Success Metrics
Some broad metrics to help direct attention on the success of the Indexers. Note: I think there's a challenge in a decentralized environment to get infrastructure metrics (it might be done already but I couldn't find it). Is this something that is worth acquiring globally?
-
Performance metrics across infrastructure
- Query performance
- Index performance
-
Financial return metrics
- Rewards earned by type (e.g indexing rewards, query rewards, MIPS)
- Rewards by Subgraphs
- Average rewards by Indexers
- Average returns for Delegated Stakes
Engagement
- Involvement in office hours
- Outbound activities ?
Growth
- Number of indexers
- Correlation of indexer growth on rewards and performance
Research
Tooling
Indexers seem to have their own tooling based on what they are most familiar with. Here is a sampling, which includes tools they have created for themselves.
- ELK stack
- Grafana
- Zabbix roadmap
- Graphscan
- Graph Indexers
- Streamlit • A faster way to build and share data apps
- GitHub - chainstack/graph-deployment: The Graph related stuff to run it in the cloud
- GitHub - StakeSquid/graphprotocol-mainnet-docker: Graph Protocol Mainnet Docker Guide by StakeSquid. Easy to use, one-stop gateway into the Web3 universe via The Graph.
Questions
Have preferences emerged?
It seems like there are existing containers for running subgraphs and the such. What preferences have formed and why?
Should there be a first-class supported tooling solution for monitoring?
It appears that there are different flavors of monitoring.
Given that there are flexible sources, both open source and SaaS, should there be a recommended tool with pre-built templates and agents that simplifies this part of the Indexer Experience?
The scope of monitoring contains a mix of standard infrastructure telemetry, such as CPU, storage, bandwidth, standard application metrics around indexers, such as query volume and latency, and metrics specific to being an Indexer such as POI discrepancy, actions queue, sync rate to subgraphs.
These tools already seem to be cover some aspect of these.
Would it help to provide out of the box a robust tool that covers the full range?
This could allow Indexers a way to think in the two broad buckets described above -- DevOps and RevOps.
My Initial Hypothesis
While it seems that Indexers are fine with, and perhaps even enjoy, building their own tools, I believe this is because out-of-the-box tools aren't meeting their needs.
This is common, and a reasonable response is to let a thousand flowers bloom.
However, as the number of Indexers grow, providing more standard and supported tools will be appealing because it frees time from building tools to monitor and allows them to focus on DevOps (actual management of infrastructure) and I believe, for those that want to differentiate themselves, RevOps (what can drive higher returns).
I believe Observability is a high-leverage area that can drive common standards.
How an Indexer actually manages and deploys may be bespoke and custom to their backgrounds (although I believe better abstractions can improve it, but it's hard.)
However, we should be able to converge on standard industry metrics and telemetry, while also deeply empathizing with them on The Graph specific metrics. This, combined with improving the tools for root-cause, could free up time while also speeding up the MTTR and overall performance.
Based on this thesis, I took a pass at the Indexer Operational Dashboard.