SlideShare a Scribd company logo
1 of 31
Download to read offline
Наш	ответ	Uber’у	
Александр	Коротков	
Postgres	Professional
Russian developers of PostgreSQL:
Alexander Korotkov, Teodor Sigaev, Oleg Bartunov
▶ Speakers at PGCon, PGConf: 20+ talks
▶ GSoC mentors
▶ PostgreSQL commi ers (1+1 in progress)
▶ Conference organizers
▶ 50+ years of expertship: development, audit, consul ng
▶ Postgres Professional co-founders
PostgreSQL CORE
▶ Locale support
▶ PostgreSQL extendability:
GiST(KNN), GIN, SP-GiST
▶ Full Text Search (FTS)
▶ NoSQL (hstore, jsonb)
▶ Indexed regexp search
▶ Create AM & Generic WAL
▶ Table engines (WIP)
Extensions
▶ intarray
▶ pg_trgm
▶ ltree
▶ hstore
▶ plantuner
▶ jsquery
▶ RUM
▶ imgsmlr
Alexander Korotkov Наш ответ Uber’у 2 / 31
Disclaimer
▶ I’m NOT a MySQL expert. I didn’t even touch MySQL since 2011...
▶ This talk express my own opinion, not PostgreSQL community
posi on, not even Postgres Professional official posi on.
▶ Uber’s guys knows be er which database they should use.
Alexander Korotkov Наш ответ Uber’у 3 / 31
What happened?
▶ Uber migrated from MySQL to PostgreSQL in 2012.
▶ Uber migrated from PostgreSQL to MySQL in 2016.
▶ PostgreSQL to MySQL migra on made a log of buzz in PostgreSQL
community.
Alexander Korotkov Наш ответ Uber’у 4 / 31
Why did it happen?
▶ Uber migrated from MySQL to PostgreSQL for “a bunch of reasons,
but one of the most important was availability of PostGIS” ¹
▶ Uber migrated from PostgreSQL to MySQL “some of the drawbacks
they found with Postgres” ²
¹https://www.yumpu.com/en/document/view/53683323/migrating-uber-from-mysql-to-postgresql
²https://eng.uber.com/mysql-migration/
Alexander Korotkov Наш ответ Uber’у 5 / 31
Uber’s complaints to PostgreSQL
Uber claims following “PostgreSQL limita ons”:
▶ Inefficient architecture for writes
▶ Inefficient data replica on
▶ Issues with table corrup on
▶ Poor replica MVCC support
▶ Difficulty upgrading to newer releases
Alexander Korotkov Наш ответ Uber’у 6 / 31
PostgreSQL’s vs. InnoDB’s storage formats
Alexander Korotkov Наш ответ Uber’у 7 / 31
PostgreSQL storage format
▶ Both primary and secondary indexes point to loca on (blkno, offset) of tuple (row
version) in the heap.
▶ When tuple is moved to another loca on, all corresponding index tuples should be
inserted to the indexes.
▶ Heap contains both live and dead tuples.
▶ VACUUM cleans up dead tuples and corresponding index tuples in a bulk manner.
Alexander Korotkov Наш ответ Uber’у 8 / 31
Update in PostgreSQL
▶ New tuple is inserted to the heap, previous tuple is marked as
deleted.
▶ Index tuples poin ng to new tuple are inserted to all indexes.
Alexander Korotkov Наш ответ Uber’у 9 / 31
Heap-Only-Tuple (HOT) PostgreSQL
▶ When no indexed columns are updated and new version of row can
fit the same page, then HOT is used and only heap is updated.
▶ Microvacuum can be used to free required space in the page for HOT.
Alexander Korotkov Наш ответ Uber’у 10 / 31
Update in MySQL
▶ Table rows are placed in the primary index itself. Updates are performed in-place.
Old version of rows are placed to special segment (undo log).
▶ When secondary indexed column is updated, then new index tuple is inserted while
previous index tuple is marked as deleted.
Alexander Korotkov Наш ответ Uber’у 11 / 31
Updates: InnoDB in comparison with PostgreSQL
Pro:
▶ Update of few indexed columns is cheaper.
▶ Update, which don’t touch indexed columns, doesn’t depend on page
free space in the page
Cons:
▶ Update of majority of indexed columns is more expensive.
▶ Secondary index scan is slower.
▶ Primary key update is disaster.
Alexander Korotkov Наш ответ Uber’у 12 / 31
Uber example for write-amplifica on in PostgreSQL
CREATE TABLE users (id SERIAL PRIMARY KEY,
first TEXT,
last TEXT,
birth_year INTEGER);
CREATE INDEX ix_users_first_last ON users (first, last);
CREATE INDEX ix_users_birth_year ON users (birth_year);
UPDATE users SET birth_year = 1986 WHERE id = 1;
1. Write the new row tuple to the tablespace
2. Update the primary key index to add a record for the new tuple
3. Update the (first, last) index to add a record for the new tuple
4. Update the birth_year index to add a record for the new tuple
5. Previous ac ons are protected by WAL log.
Alexander Korotkov Наш ответ Uber’у 13 / 31
Uber example for write-amplifica on: MySQL vs. PostgreSQL
Alexander Korotkov Наш ответ Uber’у 14 / 31
Uber example for write-amplifica on: MySQL vs. PostgreSQL
PostgreSQL
1. Write the new row tuple to the
tablespace
2. Insert new tuple to primary key index
3. Insert new tuple to (first, last) index
4. Insert new tuple to birth_year index
5. Previous ac ons are protected by WAL
log.
MySQL
1. Update row in-place
2. Write old version of row to the rollback
segment
3. Insert new tuple to birth_year index
4. Mark old tuple of birth_year index as
obsolete
5. Previous ac ons are protected by
innodb log
6. Write update record to binary log
Assuming we have replica on turned on
Alexander Korotkov Наш ответ Uber’у 15 / 31
Pending patches: WARM (write-amplifica on reduc on
method)
▶ Behaves like HOT, but works also when some of index columns are
updated.
▶ New index tuples are inserted only for updated index columns.
https://www.postgresql.org/message-id/flat/20170110192442.ocws4pu5wjxcf45b%40alvherre.pgsql
Alexander Korotkov Наш ответ Uber’у 16 / 31
Pending patches: indirect indexes
▶ Indirect indexes are indexes which points to primary key value instead of pointer to
heap.
▶ Indirect index is not updates un l corresponding column is updated.
https://www.postgresql.org/message-id/20161018182843.xczrxsa2yd47pnru@alvherre.pgsql
Alexander Korotkov Наш ответ Uber’у 17 / 31
Ideas: RDS (recently dead store)
▶ Recently dead tuples (deleted but visible for some transac ons) are
displaced into special storage: RDS.
▶ Heap tuple headers are le in the heap.
Alexander Korotkov Наш ответ Uber’у 18 / 31
Idea: undo log
▶ Displace old version of rows to undo log.
▶ New index tuples are inserted only for updated index columns. Old index tuples are
marked as expired.
▶ Move row to another page if new version doesn’t fit the page.
https://www.postgresql.org/message-id/flat/CA%2BTgmoZS4_CvkaseW8dUcXwJuZmPhdcGBoE_
GNZXWWn6xgKh9A%40mail.gmail.com
Alexander Korotkov Наш ответ Uber’у 19 / 31
Idea: pluggable table engines
Owns
▶ Ways to scan and modify tables.
▶ Access methods implementa ons.
Shares
▶ Transac ons, snapshots.
▶ WAL.
https://www.pgcon.org/2016/schedule/events/920.en.html
Alexander Korotkov Наш ответ Uber’у 20 / 31
Types of replica on
▶ Statement-level – stream wri ng queries to the slave.
▶ Row-level – stream updated rows to the slave.
▶ Block-level – stream blocks and/or block deltas to the slave.
Alexander Korotkov Наш ответ Uber’у 21 / 31
Replica on types in PostgreSQL vs. MySQL
Replica on Type MySQL PostgreSQL
Statement-level buil n pgPool-II
Row-level buil n pgLogical
Londiste
Slony ...
Block-level N/A buil n
Alexander Korotkov Наш ответ Uber’у 22 / 31
Uber’s replica on comparison
▶ Uber compares MySQL replica on versus PostgreSQL replica on.
▶ Actually, Uber compares MySQL row-level replica on versus
PostgreSQL block-level replica on.
▶ That happened because that me PostgreSQL had buil n block-level
replica on, but didn’t have buil n row-level replica on.
Simultaneously, MySQL had buil n row-level replica on, but didn’t
have buil n block-level replica on.
Alexander Korotkov Наш ответ Uber’у 23 / 31
Uber’s complaints to PostgreSQL block-level replica on
▶ Replica on stream transfers all the changes at block-level including
“write-amplifica on”. Thus, it requires very high-bandwidth channel.
In turn, that makes geo-distributed replica on harder.
▶ There are MVCC limita ons for read-only requires on replica. Apply of
VACUUM changes conflicts with read-only queries which could see
the data VACUUM is going to delete.
Alexander Korotkov Наш ответ Uber’у 24 / 31
Is row-level replica on superior over block-level replica on?
Alibaba works on adding block-level replica on to InnoDB. Zhai Weixiang,
database developer from Alibaba considers following advantages of
block-level replica on: ³
▶ Be er performance: higher throughput and lower response me
▶ Write less data (turn off binary log and g d), and only one fsync to make
transac on durable
▶ Less recovery me
▶ Replica on
▶ Less replica on latency
▶ Ensure data consistency (most important for some sensi ve clients)
³https://www.percona.com/live/data-performance-conference-2016/sessions/
physical-replication-based-innodb
Alexander Korotkov Наш ответ Uber’у 25 / 31
Replica read-only query MVCC conflict with VACUUM
Possible op ons:
▶ Delay the replica on,
▶ Cancel read-only query on replica,
▶ Provide a feedback to master about row versions which could be demanded.
Undo log would do be er, we wouldn’t have to choose...
Alexander Korotkov Наш ответ Uber’у 26 / 31
More about replica on and write-amplifica on
MySQL row-level replica on
PostgreSQL row-level replica on (pgLogical)
Alexander Korotkov Наш ответ Uber’у 27 / 31
Major version upgrade with pg_upgrade
Alexander Korotkov Наш ответ Uber’у 28 / 31
Major version upgrade with pgLogical
https://www.depesz.com/2016/11/08/major-version-upgrading-with-minimal-downtime/
Alexander Korotkov Наш ответ Uber’у 29 / 31
Other Uber notes
▶ PostgreSQL 9.2 had data corrup on bug. It was fixed long me ago. Since that me
PostgreSQL automated tests system was significantly improved to evade such bugs in future.
▶ pread is faster than seek + read. Thats really gives 1.5% accelera on on read-only
benchmark. ⁴
▶ PostgreSQL advises to setup rela vely small shared_buffers and rely on OS cache, while
“InnoDB storage engine implements its own LRU in something it calls the InnoDB buffer
pool”. PostgreSQL also implements its own LRU in something it calls the shared buffers. And
you can setup any shared buffers size.
▶ PostgreSQL uses mul process model. So, connec on is more expensive since unless you
use pgBouncer or other external connec on pool.
⁴https://www.postgresql.org/message-id/flat/a86bd200-ebbe-d829-e3ca-0c4474b2fcb7%40ohmu.fi
Alexander Korotkov Наш ответ Uber’у 30 / 31
Thank you for a en on!
Alexander Korotkov Наш ответ Uber’у 31 / 31

More Related Content

More from Ontico

Inexpensive Datamasking for MySQL with ProxySQL — Data Anonymization for Deve...
Inexpensive Datamasking for MySQL with ProxySQL — Data Anonymization for Deve...Inexpensive Datamasking for MySQL with ProxySQL — Data Anonymization for Deve...
Inexpensive Datamasking for MySQL with ProxySQL — Data Anonymization for Deve...Ontico
 
Опыт разработки модуля межсетевого экранирования для MySQL / Олег Брославский...
Опыт разработки модуля межсетевого экранирования для MySQL / Олег Брославский...Опыт разработки модуля межсетевого экранирования для MySQL / Олег Брославский...
Опыт разработки модуля межсетевого экранирования для MySQL / Олег Брославский...Ontico
 
ProxySQL Use Case Scenarios / Alkin Tezuysal (Percona)
ProxySQL Use Case Scenarios / Alkin Tezuysal (Percona)ProxySQL Use Case Scenarios / Alkin Tezuysal (Percona)
ProxySQL Use Case Scenarios / Alkin Tezuysal (Percona)Ontico
 
MySQL Replication — Advanced Features / Петр Зайцев (Percona)
MySQL Replication — Advanced Features / Петр Зайцев (Percona)MySQL Replication — Advanced Features / Петр Зайцев (Percona)
MySQL Replication — Advanced Features / Петр Зайцев (Percona)Ontico
 
Внутренний open-source. Как разрабатывать мобильное приложение большим количе...
Внутренний open-source. Как разрабатывать мобильное приложение большим количе...Внутренний open-source. Как разрабатывать мобильное приложение большим количе...
Внутренний open-source. Как разрабатывать мобильное приложение большим количе...Ontico
 
Подробно о том, как Causal Consistency реализовано в MongoDB / Михаил Тюленев...
Подробно о том, как Causal Consistency реализовано в MongoDB / Михаил Тюленев...Подробно о том, как Causal Consistency реализовано в MongoDB / Михаил Тюленев...
Подробно о том, как Causal Consistency реализовано в MongoDB / Михаил Тюленев...Ontico
 
Балансировка на скорости проводов. Без ASIC, без ограничений. Решения NFWare ...
Балансировка на скорости проводов. Без ASIC, без ограничений. Решения NFWare ...Балансировка на скорости проводов. Без ASIC, без ограничений. Решения NFWare ...
Балансировка на скорости проводов. Без ASIC, без ограничений. Решения NFWare ...Ontico
 
Перехват трафика — мифы и реальность / Евгений Усков (Qrator Labs)
Перехват трафика — мифы и реальность / Евгений Усков (Qrator Labs)Перехват трафика — мифы и реальность / Евгений Усков (Qrator Labs)
Перехват трафика — мифы и реальность / Евгений Усков (Qrator Labs)Ontico
 
И тогда наверняка вдруг запляшут облака! / Алексей Сушков (ПЕТЕР-СЕРВИС)
И тогда наверняка вдруг запляшут облака! / Алексей Сушков (ПЕТЕР-СЕРВИС)И тогда наверняка вдруг запляшут облака! / Алексей Сушков (ПЕТЕР-СЕРВИС)
И тогда наверняка вдруг запляшут облака! / Алексей Сушков (ПЕТЕР-СЕРВИС)Ontico
 
Как мы заставили Druid работать в Одноклассниках / Юрий Невиницин (OK.RU)
Как мы заставили Druid работать в Одноклассниках / Юрий Невиницин (OK.RU)Как мы заставили Druid работать в Одноклассниках / Юрий Невиницин (OK.RU)
Как мы заставили Druid работать в Одноклассниках / Юрий Невиницин (OK.RU)Ontico
 
Разгоняем ASP.NET Core / Илья Вербицкий (WebStoating s.r.o.)
Разгоняем ASP.NET Core / Илья Вербицкий (WebStoating s.r.o.)Разгоняем ASP.NET Core / Илья Вербицкий (WebStoating s.r.o.)
Разгоняем ASP.NET Core / Илья Вербицкий (WebStoating s.r.o.)Ontico
 
100500 способов кэширования в Oracle Database или как достичь максимальной ск...
100500 способов кэширования в Oracle Database или как достичь максимальной ск...100500 способов кэширования в Oracle Database или как достичь максимальной ск...
100500 способов кэширования в Oracle Database или как достичь максимальной ск...Ontico
 
Apache Ignite Persistence: зачем Persistence для In-Memory, и как он работает...
Apache Ignite Persistence: зачем Persistence для In-Memory, и как он работает...Apache Ignite Persistence: зачем Persistence для In-Memory, и как он работает...
Apache Ignite Persistence: зачем Persistence для In-Memory, и как он работает...Ontico
 
Механизмы мониторинга баз данных: взгляд изнутри / Дмитрий Еманов (Firebird P...
Механизмы мониторинга баз данных: взгляд изнутри / Дмитрий Еманов (Firebird P...Механизмы мониторинга баз данных: взгляд изнутри / Дмитрий Еманов (Firebird P...
Механизмы мониторинга баз данных: взгляд изнутри / Дмитрий Еманов (Firebird P...Ontico
 
Как мы учились чинить самолеты в воздухе / Евгений Коломеец (Virtuozzo)
Как мы учились чинить самолеты в воздухе / Евгений Коломеец (Virtuozzo)Как мы учились чинить самолеты в воздухе / Евгений Коломеец (Virtuozzo)
Как мы учились чинить самолеты в воздухе / Евгений Коломеец (Virtuozzo)Ontico
 
Java и Linux — особенности эксплуатации / Алексей Рагозин (Дойче Банк)
Java и Linux — особенности эксплуатации / Алексей Рагозин (Дойче Банк)Java и Linux — особенности эксплуатации / Алексей Рагозин (Дойче Банк)
Java и Linux — особенности эксплуатации / Алексей Рагозин (Дойче Банк)Ontico
 
Как построить кластер для расчета сотен тысяч high-CPU/high-MEM-задач и не ра...
Как построить кластер для расчета сотен тысяч high-CPU/high-MEM-задач и не ра...Как построить кластер для расчета сотен тысяч high-CPU/high-MEM-задач и не ра...
Как построить кластер для расчета сотен тысяч high-CPU/high-MEM-задач и не ра...Ontico
 
Отказоустойчивая архитектура фронтальной системы банка / Роман Шеховцов, Алек...
Отказоустойчивая архитектура фронтальной системы банка / Роман Шеховцов, Алек...Отказоустойчивая архитектура фронтальной системы банка / Роман Шеховцов, Алек...
Отказоустойчивая архитектура фронтальной системы банка / Роман Шеховцов, Алек...Ontico
 
libfpta — обгоняя SQLite и Tarantool / Леонид Юрьев (Positive Technologies)
libfpta — обгоняя SQLite и Tarantool / Леонид Юрьев (Positive Technologies)libfpta — обгоняя SQLite и Tarantool / Леонид Юрьев (Positive Technologies)
libfpta — обгоняя SQLite и Tarantool / Леонид Юрьев (Positive Technologies)Ontico
 
Синхронизация данных из PgSQL в Tarantool / Вениамин Гвоздиков (Calltouch)
Синхронизация данных из PgSQL в Tarantool / Вениамин Гвоздиков (Calltouch)Синхронизация данных из PgSQL в Tarantool / Вениамин Гвоздиков (Calltouch)
Синхронизация данных из PgSQL в Tarantool / Вениамин Гвоздиков (Calltouch)Ontico
 

More from Ontico (20)

Inexpensive Datamasking for MySQL with ProxySQL — Data Anonymization for Deve...
Inexpensive Datamasking for MySQL with ProxySQL — Data Anonymization for Deve...Inexpensive Datamasking for MySQL with ProxySQL — Data Anonymization for Deve...
Inexpensive Datamasking for MySQL with ProxySQL — Data Anonymization for Deve...
 
Опыт разработки модуля межсетевого экранирования для MySQL / Олег Брославский...
Опыт разработки модуля межсетевого экранирования для MySQL / Олег Брославский...Опыт разработки модуля межсетевого экранирования для MySQL / Олег Брославский...
Опыт разработки модуля межсетевого экранирования для MySQL / Олег Брославский...
 
ProxySQL Use Case Scenarios / Alkin Tezuysal (Percona)
ProxySQL Use Case Scenarios / Alkin Tezuysal (Percona)ProxySQL Use Case Scenarios / Alkin Tezuysal (Percona)
ProxySQL Use Case Scenarios / Alkin Tezuysal (Percona)
 
MySQL Replication — Advanced Features / Петр Зайцев (Percona)
MySQL Replication — Advanced Features / Петр Зайцев (Percona)MySQL Replication — Advanced Features / Петр Зайцев (Percona)
MySQL Replication — Advanced Features / Петр Зайцев (Percona)
 
Внутренний open-source. Как разрабатывать мобильное приложение большим количе...
Внутренний open-source. Как разрабатывать мобильное приложение большим количе...Внутренний open-source. Как разрабатывать мобильное приложение большим количе...
Внутренний open-source. Как разрабатывать мобильное приложение большим количе...
 
Подробно о том, как Causal Consistency реализовано в MongoDB / Михаил Тюленев...
Подробно о том, как Causal Consistency реализовано в MongoDB / Михаил Тюленев...Подробно о том, как Causal Consistency реализовано в MongoDB / Михаил Тюленев...
Подробно о том, как Causal Consistency реализовано в MongoDB / Михаил Тюленев...
 
Балансировка на скорости проводов. Без ASIC, без ограничений. Решения NFWare ...
Балансировка на скорости проводов. Без ASIC, без ограничений. Решения NFWare ...Балансировка на скорости проводов. Без ASIC, без ограничений. Решения NFWare ...
Балансировка на скорости проводов. Без ASIC, без ограничений. Решения NFWare ...
 
Перехват трафика — мифы и реальность / Евгений Усков (Qrator Labs)
Перехват трафика — мифы и реальность / Евгений Усков (Qrator Labs)Перехват трафика — мифы и реальность / Евгений Усков (Qrator Labs)
Перехват трафика — мифы и реальность / Евгений Усков (Qrator Labs)
 
И тогда наверняка вдруг запляшут облака! / Алексей Сушков (ПЕТЕР-СЕРВИС)
И тогда наверняка вдруг запляшут облака! / Алексей Сушков (ПЕТЕР-СЕРВИС)И тогда наверняка вдруг запляшут облака! / Алексей Сушков (ПЕТЕР-СЕРВИС)
И тогда наверняка вдруг запляшут облака! / Алексей Сушков (ПЕТЕР-СЕРВИС)
 
Как мы заставили Druid работать в Одноклассниках / Юрий Невиницин (OK.RU)
Как мы заставили Druid работать в Одноклассниках / Юрий Невиницин (OK.RU)Как мы заставили Druid работать в Одноклассниках / Юрий Невиницин (OK.RU)
Как мы заставили Druid работать в Одноклассниках / Юрий Невиницин (OK.RU)
 
Разгоняем ASP.NET Core / Илья Вербицкий (WebStoating s.r.o.)
Разгоняем ASP.NET Core / Илья Вербицкий (WebStoating s.r.o.)Разгоняем ASP.NET Core / Илья Вербицкий (WebStoating s.r.o.)
Разгоняем ASP.NET Core / Илья Вербицкий (WebStoating s.r.o.)
 
100500 способов кэширования в Oracle Database или как достичь максимальной ск...
100500 способов кэширования в Oracle Database или как достичь максимальной ск...100500 способов кэширования в Oracle Database или как достичь максимальной ск...
100500 способов кэширования в Oracle Database или как достичь максимальной ск...
 
Apache Ignite Persistence: зачем Persistence для In-Memory, и как он работает...
Apache Ignite Persistence: зачем Persistence для In-Memory, и как он работает...Apache Ignite Persistence: зачем Persistence для In-Memory, и как он работает...
Apache Ignite Persistence: зачем Persistence для In-Memory, и как он работает...
 
Механизмы мониторинга баз данных: взгляд изнутри / Дмитрий Еманов (Firebird P...
Механизмы мониторинга баз данных: взгляд изнутри / Дмитрий Еманов (Firebird P...Механизмы мониторинга баз данных: взгляд изнутри / Дмитрий Еманов (Firebird P...
Механизмы мониторинга баз данных: взгляд изнутри / Дмитрий Еманов (Firebird P...
 
Как мы учились чинить самолеты в воздухе / Евгений Коломеец (Virtuozzo)
Как мы учились чинить самолеты в воздухе / Евгений Коломеец (Virtuozzo)Как мы учились чинить самолеты в воздухе / Евгений Коломеец (Virtuozzo)
Как мы учились чинить самолеты в воздухе / Евгений Коломеец (Virtuozzo)
 
Java и Linux — особенности эксплуатации / Алексей Рагозин (Дойче Банк)
Java и Linux — особенности эксплуатации / Алексей Рагозин (Дойче Банк)Java и Linux — особенности эксплуатации / Алексей Рагозин (Дойче Банк)
Java и Linux — особенности эксплуатации / Алексей Рагозин (Дойче Банк)
 
Как построить кластер для расчета сотен тысяч high-CPU/high-MEM-задач и не ра...
Как построить кластер для расчета сотен тысяч high-CPU/high-MEM-задач и не ра...Как построить кластер для расчета сотен тысяч high-CPU/high-MEM-задач и не ра...
Как построить кластер для расчета сотен тысяч high-CPU/high-MEM-задач и не ра...
 
Отказоустойчивая архитектура фронтальной системы банка / Роман Шеховцов, Алек...
Отказоустойчивая архитектура фронтальной системы банка / Роман Шеховцов, Алек...Отказоустойчивая архитектура фронтальной системы банка / Роман Шеховцов, Алек...
Отказоустойчивая архитектура фронтальной системы банка / Роман Шеховцов, Алек...
 
libfpta — обгоняя SQLite и Tarantool / Леонид Юрьев (Positive Technologies)
libfpta — обгоняя SQLite и Tarantool / Леонид Юрьев (Positive Technologies)libfpta — обгоняя SQLite и Tarantool / Леонид Юрьев (Positive Technologies)
libfpta — обгоняя SQLite и Tarantool / Леонид Юрьев (Positive Technologies)
 
Синхронизация данных из PgSQL в Tarantool / Вениамин Гвоздиков (Calltouch)
Синхронизация данных из PgSQL в Tarantool / Вениамин Гвоздиков (Calltouch)Синхронизация данных из PgSQL в Tarantool / Вениамин Гвоздиков (Calltouch)
Синхронизация данных из PgSQL в Tarantool / Вениамин Гвоздиков (Calltouch)
 

Recently uploaded

Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxpranjaldaimarysona
 
GDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSCAESB
 
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024hassan khalil
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSRajkumarAkumalla
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVRajaP95
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...ranjana rawat
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130Suhani Kapoor
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxpurnimasatapathy1234
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINESIVASHANKAR N
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxJoão Esperancinha
 
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)Suman Mia
 
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝soniya singh
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxupamatechverse
 
Analog to Digital and Digital to Analog Converter
Analog to Digital and Digital to Analog ConverterAnalog to Digital and Digital to Analog Converter
Analog to Digital and Digital to Analog ConverterAbhinavSharma374939
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )Tsuyoshi Horigome
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Dr.Costas Sachpazis
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130Suhani Kapoor
 

Recently uploaded (20)

Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptx
 
GDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentation
 
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
 
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptx
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
 
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptxExploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
 
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
 
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptx
 
Analog to Digital and Digital to Analog Converter
Analog to Digital and Digital to Analog ConverterAnalog to Digital and Digital to Analog Converter
Analog to Digital and Digital to Analog Converter
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
 

Наш ответ Uber'у / Александр Коротков (Postgres Professional)

  • 2. Russian developers of PostgreSQL: Alexander Korotkov, Teodor Sigaev, Oleg Bartunov ▶ Speakers at PGCon, PGConf: 20+ talks ▶ GSoC mentors ▶ PostgreSQL commi ers (1+1 in progress) ▶ Conference organizers ▶ 50+ years of expertship: development, audit, consul ng ▶ Postgres Professional co-founders PostgreSQL CORE ▶ Locale support ▶ PostgreSQL extendability: GiST(KNN), GIN, SP-GiST ▶ Full Text Search (FTS) ▶ NoSQL (hstore, jsonb) ▶ Indexed regexp search ▶ Create AM & Generic WAL ▶ Table engines (WIP) Extensions ▶ intarray ▶ pg_trgm ▶ ltree ▶ hstore ▶ plantuner ▶ jsquery ▶ RUM ▶ imgsmlr Alexander Korotkov Наш ответ Uber’у 2 / 31
  • 3. Disclaimer ▶ I’m NOT a MySQL expert. I didn’t even touch MySQL since 2011... ▶ This talk express my own opinion, not PostgreSQL community posi on, not even Postgres Professional official posi on. ▶ Uber’s guys knows be er which database they should use. Alexander Korotkov Наш ответ Uber’у 3 / 31
  • 4. What happened? ▶ Uber migrated from MySQL to PostgreSQL in 2012. ▶ Uber migrated from PostgreSQL to MySQL in 2016. ▶ PostgreSQL to MySQL migra on made a log of buzz in PostgreSQL community. Alexander Korotkov Наш ответ Uber’у 4 / 31
  • 5. Why did it happen? ▶ Uber migrated from MySQL to PostgreSQL for “a bunch of reasons, but one of the most important was availability of PostGIS” ¹ ▶ Uber migrated from PostgreSQL to MySQL “some of the drawbacks they found with Postgres” ² ¹https://www.yumpu.com/en/document/view/53683323/migrating-uber-from-mysql-to-postgresql ²https://eng.uber.com/mysql-migration/ Alexander Korotkov Наш ответ Uber’у 5 / 31
  • 6. Uber’s complaints to PostgreSQL Uber claims following “PostgreSQL limita ons”: ▶ Inefficient architecture for writes ▶ Inefficient data replica on ▶ Issues with table corrup on ▶ Poor replica MVCC support ▶ Difficulty upgrading to newer releases Alexander Korotkov Наш ответ Uber’у 6 / 31
  • 7. PostgreSQL’s vs. InnoDB’s storage formats Alexander Korotkov Наш ответ Uber’у 7 / 31
  • 8. PostgreSQL storage format ▶ Both primary and secondary indexes point to loca on (blkno, offset) of tuple (row version) in the heap. ▶ When tuple is moved to another loca on, all corresponding index tuples should be inserted to the indexes. ▶ Heap contains both live and dead tuples. ▶ VACUUM cleans up dead tuples and corresponding index tuples in a bulk manner. Alexander Korotkov Наш ответ Uber’у 8 / 31
  • 9. Update in PostgreSQL ▶ New tuple is inserted to the heap, previous tuple is marked as deleted. ▶ Index tuples poin ng to new tuple are inserted to all indexes. Alexander Korotkov Наш ответ Uber’у 9 / 31
  • 10. Heap-Only-Tuple (HOT) PostgreSQL ▶ When no indexed columns are updated and new version of row can fit the same page, then HOT is used and only heap is updated. ▶ Microvacuum can be used to free required space in the page for HOT. Alexander Korotkov Наш ответ Uber’у 10 / 31
  • 11. Update in MySQL ▶ Table rows are placed in the primary index itself. Updates are performed in-place. Old version of rows are placed to special segment (undo log). ▶ When secondary indexed column is updated, then new index tuple is inserted while previous index tuple is marked as deleted. Alexander Korotkov Наш ответ Uber’у 11 / 31
  • 12. Updates: InnoDB in comparison with PostgreSQL Pro: ▶ Update of few indexed columns is cheaper. ▶ Update, which don’t touch indexed columns, doesn’t depend on page free space in the page Cons: ▶ Update of majority of indexed columns is more expensive. ▶ Secondary index scan is slower. ▶ Primary key update is disaster. Alexander Korotkov Наш ответ Uber’у 12 / 31
  • 13. Uber example for write-amplifica on in PostgreSQL CREATE TABLE users (id SERIAL PRIMARY KEY, first TEXT, last TEXT, birth_year INTEGER); CREATE INDEX ix_users_first_last ON users (first, last); CREATE INDEX ix_users_birth_year ON users (birth_year); UPDATE users SET birth_year = 1986 WHERE id = 1; 1. Write the new row tuple to the tablespace 2. Update the primary key index to add a record for the new tuple 3. Update the (first, last) index to add a record for the new tuple 4. Update the birth_year index to add a record for the new tuple 5. Previous ac ons are protected by WAL log. Alexander Korotkov Наш ответ Uber’у 13 / 31
  • 14. Uber example for write-amplifica on: MySQL vs. PostgreSQL Alexander Korotkov Наш ответ Uber’у 14 / 31
  • 15. Uber example for write-amplifica on: MySQL vs. PostgreSQL PostgreSQL 1. Write the new row tuple to the tablespace 2. Insert new tuple to primary key index 3. Insert new tuple to (first, last) index 4. Insert new tuple to birth_year index 5. Previous ac ons are protected by WAL log. MySQL 1. Update row in-place 2. Write old version of row to the rollback segment 3. Insert new tuple to birth_year index 4. Mark old tuple of birth_year index as obsolete 5. Previous ac ons are protected by innodb log 6. Write update record to binary log Assuming we have replica on turned on Alexander Korotkov Наш ответ Uber’у 15 / 31
  • 16. Pending patches: WARM (write-amplifica on reduc on method) ▶ Behaves like HOT, but works also when some of index columns are updated. ▶ New index tuples are inserted only for updated index columns. https://www.postgresql.org/message-id/flat/20170110192442.ocws4pu5wjxcf45b%40alvherre.pgsql Alexander Korotkov Наш ответ Uber’у 16 / 31
  • 17. Pending patches: indirect indexes ▶ Indirect indexes are indexes which points to primary key value instead of pointer to heap. ▶ Indirect index is not updates un l corresponding column is updated. https://www.postgresql.org/message-id/20161018182843.xczrxsa2yd47pnru@alvherre.pgsql Alexander Korotkov Наш ответ Uber’у 17 / 31
  • 18. Ideas: RDS (recently dead store) ▶ Recently dead tuples (deleted but visible for some transac ons) are displaced into special storage: RDS. ▶ Heap tuple headers are le in the heap. Alexander Korotkov Наш ответ Uber’у 18 / 31
  • 19. Idea: undo log ▶ Displace old version of rows to undo log. ▶ New index tuples are inserted only for updated index columns. Old index tuples are marked as expired. ▶ Move row to another page if new version doesn’t fit the page. https://www.postgresql.org/message-id/flat/CA%2BTgmoZS4_CvkaseW8dUcXwJuZmPhdcGBoE_ GNZXWWn6xgKh9A%40mail.gmail.com Alexander Korotkov Наш ответ Uber’у 19 / 31
  • 20. Idea: pluggable table engines Owns ▶ Ways to scan and modify tables. ▶ Access methods implementa ons. Shares ▶ Transac ons, snapshots. ▶ WAL. https://www.pgcon.org/2016/schedule/events/920.en.html Alexander Korotkov Наш ответ Uber’у 20 / 31
  • 21. Types of replica on ▶ Statement-level – stream wri ng queries to the slave. ▶ Row-level – stream updated rows to the slave. ▶ Block-level – stream blocks and/or block deltas to the slave. Alexander Korotkov Наш ответ Uber’у 21 / 31
  • 22. Replica on types in PostgreSQL vs. MySQL Replica on Type MySQL PostgreSQL Statement-level buil n pgPool-II Row-level buil n pgLogical Londiste Slony ... Block-level N/A buil n Alexander Korotkov Наш ответ Uber’у 22 / 31
  • 23. Uber’s replica on comparison ▶ Uber compares MySQL replica on versus PostgreSQL replica on. ▶ Actually, Uber compares MySQL row-level replica on versus PostgreSQL block-level replica on. ▶ That happened because that me PostgreSQL had buil n block-level replica on, but didn’t have buil n row-level replica on. Simultaneously, MySQL had buil n row-level replica on, but didn’t have buil n block-level replica on. Alexander Korotkov Наш ответ Uber’у 23 / 31
  • 24. Uber’s complaints to PostgreSQL block-level replica on ▶ Replica on stream transfers all the changes at block-level including “write-amplifica on”. Thus, it requires very high-bandwidth channel. In turn, that makes geo-distributed replica on harder. ▶ There are MVCC limita ons for read-only requires on replica. Apply of VACUUM changes conflicts with read-only queries which could see the data VACUUM is going to delete. Alexander Korotkov Наш ответ Uber’у 24 / 31
  • 25. Is row-level replica on superior over block-level replica on? Alibaba works on adding block-level replica on to InnoDB. Zhai Weixiang, database developer from Alibaba considers following advantages of block-level replica on: ³ ▶ Be er performance: higher throughput and lower response me ▶ Write less data (turn off binary log and g d), and only one fsync to make transac on durable ▶ Less recovery me ▶ Replica on ▶ Less replica on latency ▶ Ensure data consistency (most important for some sensi ve clients) ³https://www.percona.com/live/data-performance-conference-2016/sessions/ physical-replication-based-innodb Alexander Korotkov Наш ответ Uber’у 25 / 31
  • 26. Replica read-only query MVCC conflict with VACUUM Possible op ons: ▶ Delay the replica on, ▶ Cancel read-only query on replica, ▶ Provide a feedback to master about row versions which could be demanded. Undo log would do be er, we wouldn’t have to choose... Alexander Korotkov Наш ответ Uber’у 26 / 31
  • 27. More about replica on and write-amplifica on MySQL row-level replica on PostgreSQL row-level replica on (pgLogical) Alexander Korotkov Наш ответ Uber’у 27 / 31
  • 28. Major version upgrade with pg_upgrade Alexander Korotkov Наш ответ Uber’у 28 / 31
  • 29. Major version upgrade with pgLogical https://www.depesz.com/2016/11/08/major-version-upgrading-with-minimal-downtime/ Alexander Korotkov Наш ответ Uber’у 29 / 31
  • 30. Other Uber notes ▶ PostgreSQL 9.2 had data corrup on bug. It was fixed long me ago. Since that me PostgreSQL automated tests system was significantly improved to evade such bugs in future. ▶ pread is faster than seek + read. Thats really gives 1.5% accelera on on read-only benchmark. ⁴ ▶ PostgreSQL advises to setup rela vely small shared_buffers and rely on OS cache, while “InnoDB storage engine implements its own LRU in something it calls the InnoDB buffer pool”. PostgreSQL also implements its own LRU in something it calls the shared buffers. And you can setup any shared buffers size. ▶ PostgreSQL uses mul process model. So, connec on is more expensive since unless you use pgBouncer or other external connec on pool. ⁴https://www.postgresql.org/message-id/flat/a86bd200-ebbe-d829-e3ca-0c4474b2fcb7%40ohmu.fi Alexander Korotkov Наш ответ Uber’у 30 / 31
  • 31. Thank you for a en on! Alexander Korotkov Наш ответ Uber’у 31 / 31