How we improved an index lookup query by 110% in a distributed SQL database

Jinpeng Zhang
6 min readOct 21, 2022

--

Clustered Index and secondary index in TiDB

Like many other SQL databases, TiDB supports both clustered index and secondary index. There is at most one clustered index for each table, since the clustered index will organize all the row data of this table in TiDB. More details about clustered index in TiDB please click here

CREATE TABLE t1 (
a int(10),
b varchar(32),
c tinyint,
d datetime,
e char(32),
f blob,
primary key(a, b) /*T![clustered_index] CLUSTERED */,
key(d),
unique key(e)
);

We created a table t1 with 3 indices, one primary key(a, b) that hinted as a CLUSTERED index, a secondary index(d) and a unique index(e). Now, let's go deep into the underlying data layout of this table.

TiDB uses TiKV, a distributed transactional key value database, as its storage layer. So both row data and index data are stored as key value format in TiKV. The layout pattern of the t1's primary key also is a clustered index is (a, b) -> (c, d, e, f) where (a, b) as the row key, (c, d, e, f) as the other columns' content. If there are two rows in this table, the data layout of these two rows look like:

(1, "abc") -> (1, "20221020 16:26:30", "xyz", "xxxxxx")
(2, "def") -> (0, "20221020 16:27:30", "zyx", "yyyyyy")

We also created a secondary index on the d column. The data layout pattern of this secondary index is (d, a, b) -> (). You may feel a bit curious about why (a, b) the primary key columns are appended to the index key? This is because column d is a secondary index without unique attributes and different rows may have the same value in d column. In order to distinguish index key for different rows, TiDB append the primary key columns (a, b) to the index key. The data layout of index d looks like:

("20221020 16:26:30", 1, "abc") -> ()
("20221020 16:27:30", 2, "def") -> ()

The last secondary index(e) of table t1 has a unique keyword, its data layout pattern is (e) -> (a, b) because we can guarantee the index key's uniqueness when use column e 's content as index key. The index value refers to the primary key.

("xyz") -> (1, "abc")
("zyx") -> (2, "def")

To summarize, the data layout of table t1 is:

  • Row data layout (primary key organized row data): (a, b) -> (c, d, e, f)
  • Secondary index data layout: (d, a, b) -> ()
  • Unique secondary index data layout: (e) -> (a, b)

We create another table t2 similar to table t1. The only difference is that we hinted the primary key as NOCLUSTERED.

CREATE TABLE t2 (
a int(10),
b varchar(32),
c tinyint,
d datetime,
e char(32),
f blob,
primary key(a, b) /*T![clustered_index] NOCLUSTERED */,
key(d),
unique key(e)
);

The row data and index data layout of table t2 would be different with table t1. As a NOCLUSTERED table, TiDB would use an implicit _row_id to organize the row data. The secondary index data part is similar to t1 except it will refer to _row_id instead of the primary key.

Data layout of table t2 is:

  • Row data layout: (_row_id) -> (a, b, c, d, e, f)
  • Primary index data layout: (a, b) -> (_row_id)
  • Secondary index data layout: (d, _row_id) -> ()
  • Unique secondary index data layout: (e) -> (_row_id)

Index lookup query and tidb_distsql_scan_concurrency

This secondary index(d) in table t1 or t2 would be used in statements like select * from t1 where d between "20221020" and "20221021". This query will lookup index d to get row keys that match the condition, and then use the row keys to get the row data.

Because TiKV(TiDB’s storage layer) is a distributed key value storage, both row data and index data will be partitioned by range and stored in several storage nodes. As the following diagram shows, the rows the example query needs may be partitioned into different ranges, and stored in different storage nodes.

After getting row keys, TiDB will trigger the row data getting procedure, because these rows may be distributed in many data ranges, so TiDB will construct a sub-request for each data range that contains wanted rows and route them to the corresponding storage nodes concurrently.

There are at most tidb_distsql_scan_concurrency sub-requests that are sent to storage layer at the same time for one SQL query. The default value of tidb_distsql_scan_concurrency is 15. This system variable affects the concurrency of every SELECT statement that is sent by user. The initial purpose of this variable is control the concurrency of statements like SELECT COUNT(*) FROM T to protect the system from overwhelming caused by these heavy queries.

Let’s come back to the example of index lookup query, if the result contains 1,000 rows and they are distributed at 1,000 ranges. TiDB would construct 1,000 small (each sub-request just retrieve 1 row from storage layer) sub-requests that will be sent to storage layer, but TiDB can only send 15 of them at the same time. The latency of such queries will increase because of sub-requests’ queuing.

Our storage team engineer Mu Tong reproduced such an issue by using this script, and he tested how the value of tidb_distsql_scan_concurrency affects the latency of such queries.

Intelligent concurrency for small sub-requests

From the test result we can see, as we increase the value of tidb_distsql_scan_concurrency, the latency will decrease. But the latency trending becomes smooth when the value increases to 256, still have a long distance to 1,000, which is the max number of sub-requests. This is because in the real world, some sub-requests may be slower than other sub-requests, and the latency of the query is decided by these slow sub-requests, not those fast sub-requests.

How to set the concurrency for each query dynamically without latency sacrify? From the above diagram we have an approximate concurrency formula:

Assume the latency of sub-requests conform to the normal distribution pattern:

Then the number of concurrency required for a query that contains n small sub-requests can be estimated as the expected value of the maximum of n normal distributed small sub-requests:

Mu Tong did some experiments on different environments, determined the proper value of μ and σ. You can check the implementation detail of the intelligent concurrency here: https://github.com/pingcap/tidb/pull/37725

Finally, we tested the improvement of intelligent concurrency. It reduced the latency of such queries from 70.2ms to 33.2ms, 110% improvement.

BTW, this optimization will be released in TiDB 6.4 version.

--

--

Jinpeng Zhang
Jinpeng Zhang

Written by Jinpeng Zhang

Director of Engineering @ TiDB, focus on building large scale distributed system and high performance engineering team.

No responses yet