Participation- participation pnumber timestamp- project-
Background
Logical view of data
Design fundamentals
Physical implementation
Performance
Background
Logical view of data
Design fundamentals
Physical implementation
Performance
Values within a cell have multiple versions
Versions are identiXed by their version number, which by default is a
If a timestamp is not determined during a read, the latest one is returned
The maximum allowed number of cell value versions is determined for each column family
{"Row-0001": HBase Table
{"Home":
{"Name":
{"timestamp-1":"James"}
"Phones":
{"timestamp-1":"2 42 214339"
"timestamp-2":"2 42 213456"
"timestamp-3":"+61 2 4567890"} }
"Office":
{"Phone":
{"timestamp-4":"+64 345678"}
"Address":
{"timestamp-5":"10 Ellenborough Pl"} }
}
}
{"Home":
{"Name":
{"timestamp-6":"Harry"}
"Phones":
{"timestamp-7":"2 42 214234"} }
"Office":
{"Phone":
{"timepstamp-8":"+64 345678"} "Address":
{"timestamp-9":"10 Bong Bong Rd" "timestamp-10":"23 Victoria Rd"} }
}
}
"Row-0001" "Home"->{"Name":{"timestamp-1":"James"} Key->value"
"Phones":{"timestamp-2":"2 42 214339"
"timestamp-3":"2 42213456"
"timestamp-4":"+61 2 4567890"}
}
qualiXer, and timestamp depending on what supposed to be retrieved
If all the cells in a row are of interest then a key is a row key
Background
Logical view of data
Design fundamentals
Physical implementation
Performance
- How many column families should a table have?
- What columns (qualiXers) should be included in each column family?
consist of columns and columns consists of versions
If cells contain the keys then HBase table becomes a network/graph
- Indexing is performed only for a row key
- Hbase tables are stored sorted based on a row key
- Column qualiXers are dynamic and can be deXned at write time
- Column qualiXers are stored as sequences of bytes such that they can
{"007": HBase Table
{"CUSTOMER":
{"first-name": {"timestamp-1":"James"}, "last-name": {"timestamp-2":"Bond"}, "phone": {"timestamp-1":"007-007"}, "email": {"timestamp-1":"jb@mi6.com"} }
}
}
{"DEPARTMENT":
{"dname": {"timestamp-1":"Sales"}, {"budget": {"timestamp-1":"1000"} }
}
{"MANAGER":
{"enumber": {"timestamp-2":"007"}, "first-name": {"timestamp-3":"James"}, "last-name": {"timestamp-4":"Bond"} }
}
}
Design Fundamentals
Design Fundamentals
Design Fundamentals
Design Fundamentals
Design Fundamentals
Design Fundamentals
Note, that it is possible to group in one Hbase table rows of diTerent types
{"1234567": HBase Table {"MEASURE":
{"amount": {"timestamp-1":"1000000"}
},
"BUYER":
{"phone": {"timestamp-1":"242214339"},
"first-name": {"timestamp-1":"James"},
"last-name": {"timestamp-1":"Bond"}
},
"SELLER":
{"phone": {"timestamp-1":"242215612"},
"first-name": {"timestamp-1":"Harry"},
"last-name": {"timestamp-1":"potter"}
}
}
}
Design Fundamentals
HBase Data Model
OutlineBackground
Logical view of data
Design fundamentals
Physical implementation
Performance
Physical implementation
Two special HBase tables -ROOT-and .META.keep information where the regions for the tables are hosted
The entry point for an HBase system is provided by another system called ZooKeeper
ZooKeeper is a centralized service for maintaining conXguration information, naming, providing distributed synchronization, and providing group services
Performance
Too many regions aTect performance
More regions means smaller memory iushes to persistent storage and smaller HFiles stored in HDFS
It requires HBase to process many compaction to keep the number of HFiles low
HBase can handle regions from 10 to 40 Gb
Too many regions may occur due to
-
Over-splitting with HBase's split feature - Improper presplitting with HBase presplit feature
Ojine merging regions can be used to reduce number of regions
Column families are built to group data with similar format or similar access pattern
Consequences of too many column families
Performance
- Monotonically incrementing keys: only last bits or bytes are slowly changin, e.g
timestamp used as a key, all write to a consecutive area of keys go to the same
by the same region server
- Application issues: e.g. writes always performed on the same region