In a recent project, we had a REST API with GET /profile route, which returned profile data. Profile data needed to be taken from users table as well as many other tables like: universities, identities, sectors, companies, question_answers, can_help. All of those subsequent tables are related to user with foreign key and initially there were N network requests for each table query, which was taking O(N) linear time to execute. After optimisations the time function was reduced to O(1) constant time. The optimisation was to send queries in bulk (Multiple select statemets in a single network request) and extract each table afterwards from the DB response like so:
const [ query_tables ] = await db.read.raw(`
SELECT
un.id AS id,
un.name AS name
FROM universities un
INNER JOIN user_universities us
ON un.id = us.university_id
WHERE us.user_id = ?
ORDER BY us.id DESC;
SELECT
i.id AS id,
i.name AS name,
i.emoji AS emoji,
i.is_active AS is_active
FROM user_identities ui
INNER JOIN identities i
ON i.id = ui.identity_id
WHERE ui.user_id = ?
ORDER BY ui.id DESC;
SELECT
s.id AS id,
s.name AS name,
s.emoji AS emoji
FROM user_sectors us
INNER JOIN sectors s
ON s.id = us.sector_id
WHERE us.user_id = ?
ORDER BY us.id DESC;
SELECT
c.id AS id,
c.company AS company,
c.is_active AS is_active
FROM user_companies uc
INNER JOIN companies c
ON c.id = uc.company_id
WHERE uc.user_id = ?
ORDER BY uc.id DESC;
SELECT
aq.id AS question_id,
aq.question AS question,
aq.placeholder AS placeholder,
aq.is_required AS is_required,
aqa.id AS answer_id,
aqa.answer AS answer,
aqa.hashtags AS hashtags
FROM ama_question_answer aqa
INNER JOIN ama_questions aq
ON aq.id = aqa.question_id
WHERE aqa.user_id = ?
ORDER BY aq.is_required DESC, aq.id ASC;
SELECT
ch.id AS id,
ch.name AS name,
ch.group_title AS group_title,
ch.created_at AS created_at
FROM can_help_user chu
INNER JOIN can_help ch
ON ch.id = chu.can_help_id
WHERE chu.user_id = ?
`, [id, id, id, id, id, id])
const [ universities, identities, sectors, companies, question_answers, can_help ] = query_tables
user['universities'] = universities
user['identities'] = identities
user['sectors'] = sectors
user['companies'] = companies
user['question_answers'] = question_answers
user['can_help'] = can_helpknex.js was used as a query builder
Another SQL query optimisation I can remember was regarding hashtag system, where there was a hashtags table with following structure:
| id | hashtag | times_used |
|---|---|---|
pk |
varchar(50), unique |
int |
Each time a post is tagged with hashtags, all the hashtags are extracted and inserted into the table, the catch is, unique hashtag column would result in error on duplicate insertions, so, upon those ON DUPLICATE KEY errors, I could update number of times the hashtag is used by incrementing times_used column. The query used multiple requests for each hashtag, making redundant network calls to database, and I come up with following optimisation where all the hashtags are validated and converted to raw sql insertable string like so:
const data_table = answers.map(a => ({
user_id: user.id,
question_id: a.question_id,
answer: a.answer.trim(),
hashtags: a.hashtags || null,
}))
const flattened_hashtags = data_table
.filter(x => !!x.hashtags && typeof x.hashtags === 'string')
.map(({ hashtags }) => hashtags)
// ['founders', 'fundraising', 'nofilter', 'tag', 'tag']
const uniquie_hashtags = Array(...new Set(flattened_hashtags))
// ['founders', 'fundraising', 'nofilter', 'tag']
if(uniquie_hashtags.length > 0){
const sqlized_values = uniquie_hashtags.map(hashtag => `('${hashtag}')`).join(', ')
// ('founders'), ('fundraising'), ('nofilter'), ('tag')
await db.write.raw(`INSERT INTO hashtags (hashtag) VALUES ${sqlized_values} ON DUPLICATE KEY UPDATE hashtag = VALUES(hashtag), times_used = times_used + 1`)
}
data_tablecomes from well validatedrequest.body