Excerpt
One of the biggest challenges I've encountered when building a GraphQL API was how to best design the schema. Regardless of the language or framework, there's a resource somewhere to help me write the code to power that API. Yet, when it comes to the schema's structure, I draw a blank. Should I make an API that mirrors my data model? How should I structure my schema to make querying as simple and efficient as possible? Where do I draw the line between overly nesting fields and creating a flat schema with only root queries?
These questions came back to haunt me as I was designing the GraphQL API to power the Savoir dashboard (But also other clients in the future). I ended up going for a domain and consumer oriented approach which I think works really well for dashboard-type applications. I want to share the story of how I designed this API and the lessons I learned. Hopefully they may be useful for your own future projects.
## Define the consumer needs first
Savoir is a GitHub applic
One of the biggest challenges I've encountered when building a GraphQL API was how to best design the schema. Regardless of the language or framework, there's a resource somewhere to help me write the code to power that API. Yet, when it comes to the schema's structure, I draw a blank. Should I make an API that mirrors my data model? How should I structure my schema to make querying as simple and efficient as possible? Where do I draw the line between overly nesting fields and creating a flat schema with only root queries?
These questions came back to haunt me as I was designing the GraphQL API to power the Savoir dashboard (But also other clients in the future). I ended up going for a domain and consumer oriented approach which I think works really well for dashboard-type applications. I want to share the story of how I designed this API and the lessons I learned. Hopefully they may be useful for your own future projects.
## Define the consumer needs first
Savoir is a GitHub application for tracking the code's documentation status, it fetches data from GitHub and associates that data to documentation content created by users. Commits and status checks are associated with content and activity entries. I knew the API should surface that data somehow, but not to which extent. It was very likely that a status check's annotation was not a field I would end up needing for this dashboard in the same way that users owning organizations rather than the other way around is not a pattern I would likely need to surface. The first thing I needed to define was "what data will this dashboard need?".
One of our core values at Savoir is "integrated". We have designed our application to be as integrated as possible within GitHub. Our dashboard shouldn't be yet another way to write content. Instead, it should be a hub for everything outside the core experience of writing and tracking your documentation within GitHub, things like billing or a repository's settings. It should allow our users to know, at a glance, the status of their documentation and make decisions on where they have to increase or adjust their documentation efforts. The real product design process was far more in-depth than this, but this gives you a good idea of the product direction I wanted to take.
Knowing that, it became clear what this dashboard needed: access to the logged-in user data; access to GitHub organizations, their repositories, and the repository's settings; a way to edit content; and a way to track all the status checks handled by Savoir. All this data is hidden behind a user's permissions, you wouldn't want your repository's settings to be visible by other users.
## Nested schema over a flat structure
Whenever I design a GraphQL API, I tend to fall into the trap of designing that API with REST endpoints in mind. For example, for this dashboard, my first reflex was to start designing a schema like this.
```plain text
# Simplified schema
type User {
# An authenticated user's data
}
type Organization {
# A GitHub Organization
"A repository owned by this organization"
repository(name: String!): Repository
}
type Repository {
# A GitHub Repository
}
type Content {
# A content page for a documentation website
}
type Query {
user(): User
organization(id: ID!): Organization
content(path: String!): Content
}
```
As said earlier, all access to the dashboard is restricted behind login. Since we don't want this API to allow us to fetch data a user doesn't have access to, we assign the authentication token to every query. At this point, I am pretty much creating a type REST API, which has its benefits, but also a few major drawbacks. The biggest drawback of this type of schema is that we'll need to fetch the user's data for every query. If you request an organization, the API needs to check if the user authenticated with the token has access to that organization.
The main outlier here is the repository query, which is nested as a field in the organization. I could have made it a query as well, but I would then have needed to take an organization's ID as well to make sure the API doesn't not accidentally fetch the wrong repository by name. It seemed silly to have that second parameter in a root query when the parent organization implicitly provides it.
By nesting the repository into the organization, it implies that the organization owns all its repositories, they cannot be fetched without first fetching the organization. Similarly, this implies that a repository cannot exist outside of an organization. To fetch a repository, the server needs to first resolve the organization. In REST, that would be represented by a domain, like /org/:id/repo/:name.
This "natural" ownership pattern came as a result of that clear relationship between the two, but also from a desire to reduce the number of parameters on a query. Looking at the schema more, there seems to be a "hidden" parameter in the user authentication token. If not using authentication headers, I could almost rewrite the query schema like this.
```plain text
type Query {
user(authToken: String!): User
organization(authToken: String!, id: ID!): Organization
content(authToken: String!, path: String!): Content
}
```
This tells me there is a clear relationship between users and every other type. I only want to allow a user to access organizations or content pages they have access to, and to do this I need to authenticate every request. Taking into account what we just learned with repositories, it shows we can solve the drawbacks outlined earlier by having the user own those fields rather than have them as queries. Rewriting this schema with that in mind, we come to this.
```plain text
# Simplified schema
type User {
# An authenticated user's data
"Fetch an organization owned or accessible by this user"
organization(id: ID!): Organization
"Fetch content owned or accessible by this user"
content(path: String!): Content
}
type Organization {
# A GitHub Organization
"A repository owned by this organization"
repository(name: String!): Repository
}
type Repository {
# A GitHub Repository
}
type Content {
# A content page for a documentation website
}
type Query {
user(): User
}
```
We now have a single query which needs authentication, once authenticated we can reuse the auth context to fetch organizations and the content that user has access to. In fact, the real Savoir API still only has a single root query, the user() query. Every other field is owned by other types, the tree gets pretty complex. To fetch a status check for example, I have to write a query that fetches the user, organization, repository, commit, then finally the status check.
This may look intense. Why design every root query as a nested field like this? What if I am fetching a pull request by number? Do I really need to get the organization in that chain? It all comes down to reusing context in my opinion. One problem I glossed over earlier was how complex it can be to check for permissions in a flat API design. How do I know the repository I am fetching can be accessed by the user? I need to validate that the user has access to the organization owning the repository in addition to the repository itself.
In a nested context, that's not something we have to worry about. Simply told, if I fetch a repository from an organization, I know that organization was accessed through the user query and thus that organization can be viewed by the user. I can then only validate admin access to that repository as permissions can be very granular in GitHub, without worrying about the permissions on the organization itself.
To go back to our earlier example, when fetching a status check through a field on a commit, I do not have to check for access to that status check. I know from the context that the user has access to the commit because it's owned by a repository the user can access. In the context of a dashboard where we definitely don't want to accidentally leak status checks to other users, that guarantee makes things a lot simpler.
That guarantee extends to other checks like existence as well. When fetching a repository by ID, I do not need to also check if the organization it is owned by still exists in GitHub, that was already checked in the parent's resolve. While the nested nature of the schema may complexity individual queries, it made the overall backend logic a lot simpler and gave a clear separation of concerns to every resolver.
## Paginate everything
Another thing I glossed over was fetching lists of elements and pagination. That is because I initially glossed over it when I was designing the API in the first place. I couldn't decide what criteria I would use to decide if I should paginate a list or not. Pagination can make queries a lot more complex (not to mention how painful they can be in TypeScript). Consider this schema, using relay pagination:
```plain text
# Simplified schema
type Repository {
# A GitHub Repository
}
type RepositoryEdge {
cursor: String
node: Repository
}
type RepositoryConnection {
edges: [RepositoryEdge]
pageInfo: PageInfo!
}
type User {
# An authenticated user's data
"Fetch all repositories owned or accessible by this user"
repositories(): RepositoryConnection
}
type Query {
user(): User
}
```
To query all the repositories for a user, the query would have to look something like this, with the $after parameter used to fetch the next page if any.
```plain text
query Repositories($after: String) {
user {
repositories(first: 20, after: $after) {
edges {
node {
..
}
}
pageInfo {
hasNextPage
endCursor
}
}
}
}
```
Accessing those repositories in JavaScript gets quite long (user.repositories.edges.map(edge => edge.node). What happens if I then want to loop over all the commits of all the repositories? Our API already follows a deeply nested structure, adding connections to all lists makes each query massive. Whether to paginate a query or not is a more than reasonable question to have: is it worth investing in paginating a list that may have, on average, 20 elements?
To answer this question, I ended up relying on the wisdom from the Lead Backend Engineer from a few roles back. Whenever we asked if we should paginate or not, they always said "If you're thinking about not paginating a list, then paginate it". Translation, always paginate. In the context of a dashboard specifically, we want things to be responsive and reactive. Unless we know for sure a list can only have 10 elements and this will never change, then a list should be paginated.
I think it is also worth considering this kind of question from the perspective of the product. A dashboard is a product, it's accessed by users to give them all the information they need to make good decisions about their usage of the product. Going back to the definition I outlined for the dashboard, what it needs to do to be a successful product, it was clear that pagination should be the standard. In the case of Savoir, the dashboard should be quick to load and mostly needs to give the user access to specific pieces of information. We do not have complex charts with thousands of data points (Which could still be paginated based on the selected time frame). In short, the UX required for a pagination field to work is more than acceptable.
In the past, I often questioned the wisdom of my old colleague, but having designed this product and the API to power it, I now understand where they are coming from. Should you be as intense as they suggested? I think it depends on your specific product needs. In the case of the Savoir dashboard, the answer was yes.
## Onto today
The Savoir dashboard is still being built as I write these lines, but these few lessons still guide the entire architecture and design of the API. What are these lessons we learned in this article? Here is a short summary:
- Define who consumes an API early, knowing the target audience of an API helps drive decisions and define the problem statement this API is for.
- Do not mirror the data or permission model in your GraphQL schema, the schema should represent the data the product needs and not the other way around.
- GraphQL works best when types are nested based on ownership. Nest queries within other types to reuse their context and simplify the backend logic.
- Always paginate unless a list is strictly limited in size or your product demands unpaginated list fields.
This post ended up being much more of a story than a tutorial, contrary to what I initially planned. Yet, I think these lessons may be useful in your own design and decision making process for your GraphQL powered dashboard. Please let us know if they were and look for the next part of this series where I'll share more about how we implemented mutations and when to combine both GraphQL and REST to power a single application. For those looking for a tutorial, we will also be releasing a post on GraphQL API documentation in the near future.
If what we are building looks interesting to you, please check our features on our website at savoir.dev. Feel free to also send me a message at info@savoir.dev, I'll be glad to answer any questions you have or give you a preview. We're also hiring!
Savoir is the french word for Knowledge, pronounced sɑvwɑɹ.