Errata
Despite thorough reviews and our best effort to make the book flawless, errors can slip through from time to time. This page contents errata, grouped by chapter to facilitate navigation.
You can submit new errata through O’Reilly portal.
Acknowledgements, p.xix, penultimate paragraph: “reviewd” should be “reviewed”. “Denis Rytsov” should be “Denis Rystsov”. “Edward Ribiero” should be “Edward Ribeiro”.
Preface
Chapter 1
DBMS Architecture, p. 10. Not a typo per se, just a poorly worded sentence: “The execution plan is handled by the execution engine, which collects the results of the execution of local and remote operations”, should be: “Plan is carried out by the execution engine, which aggregates the results of local and remote operations”.
Index Files, p. 19. Text before the Figure 1-5 should be:
a) An index-organized table, where data records are stored directly in the index file.
b) An index file storing the offsets and a separate file storing data records.
Column-Oriented Data Layout, p. 10. “Values belonging to the same row are stored closely together” should read as “Values belonging to the same column are stored closely together:”
Primary Index as an Indirection, p. 21, Figure 1-6. Assuming primary and secondary index nodes in example b) are 0-indexed, entry 1 of secondary index should point to entry 3 of primary index, and entry 2 of secondary index should point to entry 0 of primary index. In other words, entries 1 and 2 in a secondary index were interchanged.
On p. 35, “B-Trees are characterized by their fanout: the number of keys stored in each node”, shoud read “B-Trees are characterized by their fanout: the maximum number of children each nonleaf node can have”.
Figure 2-12 on p. 40 should not contain 13 in the right node after split.
B-Tree Lookup Complexity, p. 37. All variables named “K” should be “N”. A whole paragraph should read as follows:
In terms of number of transfers, the logarithm base is N (number of keys per node). There are N times more nodes on each new level, and following a child pointer reduces the search space by the factor of N. During lookup, at most logN M (where M is a total number of items in the B-Tree) pages are addressed to find a searched key. The number of child pointers that have to be followed on the root-to-leaf pass is also equal to the number of levels, in other words, the height h of the tree.
B-Tree Node Splits, p. 39. Sentence on elements that should be transferred during split contains a typo, and should read as (correction is highlighted): “All elements after the split point (including split point in the case of leaf node split) are transferred to the newly created sibling node.”
Chapter 2
Chapter 5
Steal and force policies, p. 91, “Undo rolls back updates to forced pages for committed transactions” should be “Undo rolls back updates to forced pages for uncommitted transactions”.
Read and Write Anomalies, p. 96, third paragraph. “commts” should be “commits”.
Pessimistic Concurrency Control, p.99, last paragraph. “earlier” should be “later”. Timestamp ordering allows transactions to be committed in their timestamp order, so any transaction that attempts to get committed out of order is aborted.
In section “Page Repacement”, in reference to paper on Bélády’s anomaly, namely [BEDALY69], while this does not have any influence on correctness of content, it would’ve been nice to have reference spell author’s name as [BELADY69].
Fractional Cascading, p.118, Figure 6-5. Third element of the first array should be “25” not “26”, and arrow from it should point to “25” in the second array, not to “16”. Correct image is presented below.
Chapter 6
Logarithmic Runs, p. 120, Figure 6-6. Some bridge elements on the L1 run were not marked grey. One of the pointers from L1 run was pointing from “21” to “14” instead of “12”. Full correct image is presented below.
Updates and Deletes, p. 136. “Instead, redundant records are reconciled during the read”. While the sentence is not wrong, I think it is not elaborate enough. What I meant to say was that subsequent insert, update, and delete operations to the same key are represented as separate records, which are merged and reconciled during the read.
Open-Channel SSDs, p. 161. Reference to [ZHANG13] should have been [WANG13].
Part 1 Conclusion, Figure I-1, p. 166: Bw-Trees have “No” listed in “Buffered” and “Mutable” columns. After several discussions I thought it is wroth elaborating why. Buffering is shifted to the layer below Bw-Tree itself, which is LLAMA, which is what makes Bw-Trees so elegant. At the same time, there’s no mutability in Bw-Trees or LLAMA, since records are written in blocks and eventually reconciled.
Chapter 7
Introduction and Overview, p. 171. In code snippet, variable “i” should be “x”. Full code snippet should be as follows:
int x = 1;
x += 2;
x *= 2;
System Synchrony, p. 190. “designing an efficient synchronous algorithm is not always achievable” should read as “designing an efficient asynchronous algorithm is not always achievable”.
Chapter 8
Phi-Accural Failure Detector, p. 199, should have been “Phi-Accrual”.
Chapter 9
Linearizability, p. 224. In description to Figure 11-2, wording of the point c) can be more precise.
c) The third read can return 1 or 2, depending on how operations got ordered, and where their linearization points are. This order is determined by the values returned by the operations described in a) and b). In other words, every client should writes operations in the same order: 1, then 2; 2, then 1; only 2 (implies that 1 was ordered before 2, and before both reads); or only 1 (implies that 2 was ordered before 1, and before both reads).
In section “Infamous CAP”, there is a typo in “PACELEC conjecture”. It should have been “PACELC”.
Chapter 11
Distributed Transactions with Percolator, p. 274, Figure 13-9. In section c) of the figure, “Primary at Account 1” should not be visible.
Alternatively, under assumption that an immutable store is used, “Primary” mark should be added to TS3 under Account 1. Both implementations would be correct given the resolution algorithm.
Chapter 13
Failure Scenarios, p. 290, Figure 14-4, 14-5, and 14-6, ballot number for the second Promise operation is incorrect: “Promise(1,V2)
” should be “Promise(2,V2)
”
Egalitarian Paxos, p. 301.
After collecting this information, it forwards a Pre-Accept message to a fast quorum of replicas. A fast quorum is
⌈3f/4⌉
replicas, where f is the number of tolerated failures.
Should read:
After collecting this information, it forwards a Pre-Accept message to a fast quorum of replicas. A fast quorum in basic EPaxos algorithm contains
⌈3N/4⌉
replicas, where N is the number of tolerated failures, or2f
out of the total ofN=2f+1
replicas. Fully optimized EPaxos reduces this quorum to onlyf+⌊f +1⌋
.
Chapter 14
Some of the references appear to be not sorted alphabetically. Even though all of them are present, they are hard to find in a printed version. All references that were present in the wrong order are added below in alphabetical order:
[ALHOUMAILY10] Yousef J. Al-Houmaily. 2010. Atomic commit protocols, their integration, and their optimisations in distributed database systems. Int. J. Intell. Inf. Database Syst. 4, 4 (September 2010), 373-412.
[BAUDET19] "State Machine Replication in the Libra Blockchain". Mathieu Baudet, Avery Ching, Andrey Chursin, George Danezis, François Garillot, Zekun Li, Dahlia Malkhi, Oded Naor, Dmitri Perelman, Alberto Sonnino. 2019
[BUCHMAN18] "The latest gossip on BFT consensus". Ethan Buchman, Jae Kwon, Zarko Milosevic. 2018.
[CESATI05] "Understanding the Linux Kernel, 3rd Edition". Marco Cesati, Daniel P. Bovet. 2005.
[CLOCK06] "Implementing a Better Cache Replacement Algorithm in Apache Derby". Gokul Soundararajan. 2006.
[CORMODE11] Cormode, Graham and S. Muthukrishnan. “Approximating Data with the Count-Min Data Structure.” (2011).
[DECHEV10] Damian Dechev, Peter Pirkelbauer, and Bjarne Stroustrup. 2010. Understanding and Effectively Preventing the ABA Problem in Descriptor-Based Lock-Free Designs. In Proceedings of the 2010 13th IEEE International Symposium on Objec/Component/Service-Oriented Real-Time Distributed Computing (ISORC '10). IEEE Computer Society, Washington, DC, USA, 185-192.
[DOWNEY12] "Be Careful with Sloppy Quorums". Jim Downey. 2012.
[FOWLER11] "The LMAX Architecture". Martin Fowler. 2011
[FREILING11] Felix C. Freiling, Rachid Guerraoui, and Petr Kuznetsov. 2011. The failure detector abstraction. ACM Comput. Surv. 43, 2, Article 9 (February 2011), 40 pages.
[GIAMPAOLO98] Dominic Giampaolo. 1998. Practical File System Design with the be File System (1st ed.). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA.
[GILAD17] Yossi Gilad, Rotem Hemo, Silvio Micali, Georgios Vlachos, and Nickolai Zeldovich. 2017. Algorand: Scaling Byzantine Agreements for Cryptocurrencies. In Proceedings of the 26th Symposium on Operating Systems Principles, Shanghai , China, October 28-31, 2017. 51–68.
[GRAY05] Gray, Jim; van Ingen, Catharine (December 2005). "Empirical Measurements of Disk Failure Rates and Error Rates" (PDF). Microsoft Research Technical Report MSR-TR-2005-166. Retrieved 4 March 2013.
[HYPARVIEW07] "HyParView: a membership protocol for reliable gossip-based broadcast". Joao Leitao, Jose Pereira, and Luis Rodrigues. 2007.
[KERRISK10] "The Linux Programming Interface". Michael Kerrisk. 2010. No Starch Press.
[KINGSBURY18_2] "Strong consistency models". Kyle Kingsbury. 2018.
[KLEPPMANN15] "Please stop calling databases CP or AP". Martin Kleppmann. 2015.
[KOOPMAN15] "Selection of Cyclic Redundancy Code and Checksum Algorithms to Ensure Critical Data Integrity". Philip Koopman, Kevin R. Driscoll, Brendan Hall. 2015.
[KORDAFSHARI05] M. S. Kordafshari, M. Gholipour, M. Mosakhani, A. T. Haghighat, and M. Dehghan. 2005. Modified bully election algorithm in distributed systems. 2005
[MERKLE87] Ralph C. Merkle. 1987. A Digital Signature Based on a Conventional Encryption Function. In A Conference on the Theory and Applications of Cryptographic Techniques on Advances in Cryptology (CRYPTO '87), Carl Pomerance (Ed.). Springer-Verlag, London, UK, UK, 369-378.
[MOLINA92] H. Garcia-Molina and K. Salem. 1992. Main Memory Database Systems: An Overview. IEEE Trans. on Knowl. and Data Eng. 4, 6 (December 1992), 509-516.
[NICHOLS66] The Past Participle of "Overflow": "Overflowed" or "Overflown". Ann Eljenholm Nichols. American Speech Vol. 41, No. 1 (Feb., 1966), pp. 52-55
[STONE98] "Performance of checksums and CRCs over real data". J. Stone, M. Greenwald, C. Partridge and J. Hughes, in IEEE/ACM Transactions on Networking, vol. 6, no. 5, pp. 529-543, Oct. 1998.
[SYSTEMR81] Donald D. Chamberlin, Morton M. Astrahan, Michael W. Blasgen, James N. Gray, W. Frank King, Bruce G. Lindsay, Raymond Lorie, James W. Mehl, Thomas G. Price, Franco Putzolu, Patricia Griffiths Selinger, Mario Schkolnick, Donald R. Slutz, Irving L. Traiger, Bradford W. Wade, and Robert A. Yost. 1981. A history and evaluation of System R. Commun. ACM 24, 10 (October 1981), 632-646.
[VINOSKI08] S. Vinoski, "Convenience Over Correctness" in IEEE Internet Computing, vol. 12, no. 04, pp. 89-92, 2008.
[ZHAO15] "Fast Paxos Made Easy: Theory and Implementation". Wenbing Zhao. 2015.