RCFile (Record Columnar File) is a data placement structure designed for MapReduce-based data warehouse systems. Hive added the RCFile format in version 0.6.0.

RCFile stores table data in a flat file consisting of binary key/value pairs. It first partitions rows horizontally into row splits, and then it vertically partitions each row split in a columnar way. RCFile stores the metadata of a row split as the key part of a record, and all the data of a row split as the value part.

RCFile combines the advantages of both row-store and column-store to satisfy the need for fast data loading and query processing, efficient use of storage space, and adaptability to highly dynamic workload patterns.

  • As row-store, RCFile guarantees that data in the same row are located in the same node.
  • As column-store, RCFile can exploit column-wise data compression and skip unnecessary column reads.

A shell utility is available for reading RCFile data and metadata: see RCFileCat.

For details about the RCFile format, see:

 

  • No labels