In this post, I am going to mostly focus on a particular example with our usage of Hibernate and how performance was improved.
We have a User object (who doesn’t?), which maps to many Badge objects like so:
<class name="com.utest.User"> ... <bag name="badges" table="Badges" lazy="false"></bag> </class>
Notice that lazy is set to false, because we need to make sure the User object is fully initialized before passing it out of the session context.
Now we want to load some users, say 67 of them, and we use HQL like this:
from User where userId in (:userIds)
unfortunately, this results in 68 queries (N+1 selects problem): 1 main query to load all the Users and then 67 queries to load badges for one user at a time.
One way we could try to optimize this is to add fetch=”join” to the mapping:
<bag name="badges" table="Badges" lazy="false" fetch="join">
we are hoping that Hibernate will now run a single query, with an outer join to the Badges table, but in practice it does not. The reason is that Hibernate ignores the fetch attribute from mappings when an HQL query is executed.
In order for the join to happen, we must either explicitly specify it in the HQL:
from User u left join u.Badges b where u.userId in (:userIds)
or, we must use the Hibernate Criteria API to load the users:
session.createCriteria(User.class) .add(Restrictions.in("userId", userIds)) .setResultTransformer(Criteria.DISTINCT_ROOT_ENTITY) .list()
Quick note: If you are using a 3rd party wrapper around hibernate, for example – we are using Hibernate Generic DAO, you might want to check whether it uses HQL or Criteria under the hood. Ours uses HQL, which is what led to the investigation that led to this blog post.
But which fetch strategy is faster?
Sorry, no easy answers here. It depends on the other parts of the User object mapping (how complex the single query will be), how many badges you expect per user, and even your network latency.
Here are the timings from my own machine using our actual code and loading 67 users:
1) with fetch=”select” (the default), 67+1 queries, time taken is: 287ms
2) with fetch=”join” and using distinct criteria, 1 query, time taken is: 664ms
3) with fetch=”select” but adding batch-size=”100″, 3+1 queries time taken is: 170ms
It turns out that forcing the join fetch isn’t actually the faster option, in fact it is way slower than doing 67 extra queries in our case.
The best option here is #3. By adding batch-size=”100″ we are telling Hibernate to load the Badges separately, but 100 at a time. (Aside: For reasons still unclear to me, Hibernate issues 3 selects to batch load the Badges. But if batch-size was set to a perfect 67, it would only issue 1)
In this case, doing the much simpler optimization of allowing Hibernate to batch load the badges reduces the total query time by 40%. And we don’t have to change any HQL code.
I think the biggest conclusions here are to
1) pay close attention to the sql queries that Hibernate actually runs against the database and
2) try out different mappings and time them
To help with #1, you should enable Hibernate logging of sql while doing database development by either:
a) Set show_sql setting to “true” in hibernate configuration or
b) Set logging level for “org.hibernate.SQL” to debug to see generated SQL, and perhaps also set “org.hibernate.type” level to trace to see all of the parameters being used in the queries.