HBase使用初体验
2011-05-18叔宝@搜索中心
Agenda
????QConNoSQL简介HBase介绍HBase@搜索中心遇到的问题
NoSQL@Qcon
?NoSQLs:Key/Value
–Facebook,Twitter:HBase线上每月25T消息hbase+haystack–Sina:Mysql->Mysql+Redis5wqps–豆瓣:BeansDB记录为几K~几M之间–淘宝:tair–百度:bailingDB千亿网页存储–人人:NuclearDynamo–QQMail:SimpleDB支持业务的cache–视觉中国:MongoDB
HBase介绍
?????BackgroundDataModelArchitectureFeaturesHbaseAPI
Background
?BigTable?Hadoop
Background
WhatisHBase
?????Column-orientedsemi-structureddatastoreDistributedLayeredoverHDFSTolerantofmachinefailureStrongconsistency
Fromfacebook
DataModel
?Basicconcept
–Table–Row(全局有序)–Column=Family+qualifier(不固定)–Timestamp(version)–Cell–region
DataModel(cnt.)
Table:User-Friends
InfoRow(Uid)Friends
Name
Sex
Age
…
1
3
4
…
12
JohnSmith
M
23
Bf
3
4
Lily
Lucy
F
F
22
22
Gf
Sister
Sister
DataModel(cnt.)
?PhysicalStorage
InfoRow(Uid)Friends
Name
Sex
Age
…
1
3
4
…
12
JohnSmith
M
23
Bf
3
4
Lily
Lucy
F
F
22
22
Gf
Sister
Sister
Architecture
Architecture(cnt)
?Client
–Read/WriteData(RegionServer)–SchemaManager(Master)
?Zookeeper
–Master选举和恢复–定位Rootregion–RegionServer上下线感知
?Master
–Regionassign(balancer)–Metadataoperation
?RegionServer
–用户IO请求–Split/CompactRegion
Whereisipad@querytable?
query
ipad
?3级查找
-ROOT.META1.
.META1.
zookeeper
META2
Write
MemStore
writer
HLog
Seq#1,Table13,Region11,…Seq#2,Table5,Region2,…StoreFileStoreFileStoreFileSmall
Compaction
Features
????ScalabilityHighperformanceReliabilityHbaseAPI
Scalability
?扩容
–传统方案:分库分表–HBase:直接新增机器–RegionServer
?Split?Loadbalance
–HDFS
?Schema变化
–传统方案:停机维护–HBase:动态增删列(族)
Highperformance
?随机读
–Key/Value–Cache(客户端cache+MemStore+Blocking)–按列存储–Split/balance–BloomFilter
Highperformance(cnt.)
?随机写
–WAL–Cache(MemStore)–Compact/Split/balance
Highperformance(cnt.)
?范围查询Scan
–Row全局有序
Reliability(Fault-tolerance)
?LayeredOnHDFS?WAL?AutomaticFailover
–RegionServer–Master
HBaseAPI
?????HBaseshell(likemysql/hive)JavaAPIThriftRESTJython,Scala,GroovyDSL,Cascading,Pig,Hive…
JavaAPI
??????GetPutDeleteScanHBaseAdminMapReduce
HBase@搜索中心
?一淘
–网页(文本信息+全网B2C商品)–图片(缩略图+价格图)–外网合作商家数据
?主要操作:
–网页选取–链接提取–图片处理–IndexBuild(全量+增量)
HBase@搜索中心(cnt.)
?数据平台
–卖家数据存储
????基本信息每天流量来源反作弊相关性
–Query数据存储
HBase@搜索中心(cnt.)
?Dump中心
–宝贝数据–用户数据–…
?解决增量问题
遇到的问题/经验分享
?表Schema设计
–Key:查找某个query某个类目id下的数据–ColumnFamily–每天一表or总共一张表
????
压缩Regionpre-sharding机房数据迁移(bulkloader)WAL的影响评估
相关学习资料
???????SoureCodeBigTable论文WebsiteBookWiki菜鸟看Hbaseby毕玄HBase@HadoopDaySeattleFacebook'sNewRealtimeAnalyticsSystem:HBasetoProcess20BillionEventsPerDay?QCon北京2011总结
上一篇:
dreamwver+php+mysql光盘说明
下一篇:
凉凉的清晨