HBase Interview Questions

HBase Interview Questions
Download HBase Interview Questions PDF

Below are the list of Best HBase Interview Questions and Answers

The HBase is a Hadoop database which is used for real-time read writes the data of your large amount of data. It is a java based non-relational database which is distributed column-based database.

The Hbase is real-time read and writes access as well it stores big data. The compaction in HBase is combined all HFiles into large single Hfile. The maximum number of disk seeks to reduce. it is easy for real-time reading. This is the process of compaction of HBase. The HBase clean up by itself because of compaction.

The main components of Hbase are:

  1. HMaster: it is used for assign region to region server in the Hadoop cluster for Manage and monitors the Hadoop cluster. It is used for table manipulation.
  2. Region Server: it handles every node of the Hadoop cluster. It is read, writes, deletes every request for, the client then processes it and runs on the node of the Hadoop server.
  3. Zookeeper: it is used for region Assign in the Hadoop cluster and recovers every server if it crashes when loading on the region server.
  4. Catalog table.
  5. HBase Master.

Operational commands are available in Hbase.

  • Delete: remove the row from the table.
  • Get: return the attribute for required rows.
  • Increment: increase the attribute in the table.
  • Put: add the new row or update the row in the table.
  • Scan: examine and duplication of the required row in the table.

The TTL is the acronym of time to live. The HBase is automatically deleting the row when row comes to expire time. The TTL unit is millisecond.

The HBase has store large amount of data but sometimes the client required only a few data from the database. To show the required data filter used in HBase.

There are a few filters used in HBase.

  • ColumnCountGetFilter
  • InclusiveStopFilter
  • keyOnlyFilter
  • PrefixFilter
  • PageFilter
  • Qualifier Filter
  • SingleColumnValueFilter
  • ValueFilter

There are two ways to read data from the database in HBase.

One is Get and another is Scan. Get can retrieve only one roe and Scan reads the entire table of the database.

Example:-

Syntax: 
get <'table_name'>, <'row_name'>, {< Additional_parameters>}

If the user gives delete command to the cell, it becomes invisible and set the tombstone marker. This cell only filters when the user scans the data. The cell deleted during the compaction of HBase.

We can retrieve data from HBase using java using following steps:-

  1. Instantiating Configuration class: Configuration config_reference = HBaseConfiguration.create();
  2. Instantiating HTable class: HTable table_reference = new HTable(config, "table_name");
  3. Instantiating Get class:Get get_reference = new Get(Bytes.toBytes(" table_row"));
  4. Reading the data: Result result = table.get(get_reference);
  5. Reading values from Result class object:byte [] value = result.getValue(Bytes.toBytes("table_columnfamily"),Bytes.toBytes("table_column"));

In pom.xml file add maven dependency of HBase.To connect HBase using java we need to start HBase master. HBase master start is the command to start HBase master. Then load the properties into an XML file.

The following steps are to load data into HBase from hdfs. 

  1. CREATE TABLE IN HBASE: COMMAND:create 'table_name','column'
  2. UPLOADING SIMPLE_FILE.TXT TO HDFS: command:bin/hadoop fs -copyfromlocal simple_file.txt /user/hadoop/simple_file.txt
  3. USING IMPORTTSV TO LOAD TXT TO HBASE: command:bin/hbase org.apache.hadoop.hbase.mapreduce.importtsv -dimporttsv.separator="," - dimporttsv.columns=hbase_row_key,cf table_name /user/hadoop/simple_file.txt

The bloom filter in Hbase used for test the HFile contains the specific row or row and column in the table of the database. It is used to examine the probability of getting row and column of HFiles.

The HBase comes with a tool called hbck which is executed by the HBaseFsck class. It provides various command-line for repairing or checking the region consistency with HBase and table integrity problems.

The uses of truncate command in the HBase are disabled, recreate and drop the particular tables in the database.

List of tombstone markers available in HBase.

  • Column Delete Marker
  • Family Delete Marker
  • Version Delete Marker

The MSLAB is an acronym of (Memstore-Local Allocation Buffer) in HBase.

The HBase Fsck is a class where the hbck tool used for repairing the table integrity and region consistency.

The RowKey is a representation of an identifier for a particular row that may be retrieved from a Table. HBase sorted rows using row keys in the lexicographical format.

The Memstore is used for data to accumulate temporarily in memory before permanently write. The Memstore writes buffer in HBase.

It is a RegionServer implementation in HBase. It is used for managing and serving regions.

The hotspotting occurs in HBase, If huge traffic comes from many clients or numbers of the cluster in nodes are less. This has occurred when bad RowKey design.

The namespace is used for logical table grouping into a database system.

It helped to resource management, Security, isolation. The syntax of NameSpace is below.

<table namespace>:<table qualifier> .