ElasticSearch Relation
research keyword (ElasticSearch+)
- relationship
- one to many
- manay to many
- denormalize
- deep nested
- data modling
review
- parent/child join
- 设计
- 把所有类型的content都index到同一个mapping中,并使用不同的content(talent、comapny、experience)把各自数据包起来,避免不同类型数据的字段名冲突
- mapping中有contentType字段,用来区分当前document属于哪种类型的数据
- mapping声明一个joinField,用来代表不同content的关联关系
- 样例数据
- 优点
- 每种content可以独立index成独立document,index性能高
- 缺点
- search性能比nested要弱,同时不建议做multiple layer parent join,https://www.elastic.co/guide/en/elasticsearch/reference/current/parent-join.html#_multiple_levels_of_parent_join
- 一种类型只能有一种parent,那么类似于talent -> experience -> company这样的模型,experience的关联关系无法表示出来,https://www.elastic.co/guide/en/elasticsearch/reference/current/parent-join.html#_parent_join_restrictions
- 设计
- nested
- 设计
- 从某一种content作为主体,把和这个content关联的其他content,作为该content的子字段形成json
- 1:1,contentA和contentB,以contentA作为主体,把contentB的数据作为contentA的一个object字段存起来
- 1:n,talent和experience,talent作为主体,把experience的数据作为talent的一个array objcet字段存起来
- n:1,experience和company,experience作为主体,把company的数据作为experience的一个object字段存起来
- n:n,company和industry,company作为主体,把industry的数据作为company的一个array objcet字段存起来
- n:n:n...,同上
- 样例数据
- 从某一种content作为主体,把和这个content关联的其他content,作为该content的子字段形成json
- 性能测试
- 16000条nested数据
- size: 178Mi (178Mi)
- docs: 1,265,472 (1,293,084)
- search性能还是50ms以下,同时update 50个doc大概花费200ms,1000个doc大概花费600ms
- titanhouse tp/cp也是同样的使用方法,把titan作为主体进行index,下面挂住company、experience字段,以及额外的一些计算字段
- 使用nested做es中关系的表示应该没有太大问题,场景复杂了或者nested嵌套层数多了需要对数据结构做额外优化
- 优点
- 使用简单,用objce或者object[]作为子field,天然代表了关联关系
- 缺点
- 数据冗余较多,占用存储空间更大
- 如果nested层数嵌套过深,进行reindex时(document更新),花费时间更多,index性能不如parent/child join的方式,https://www.elastic.co/blog/managing-relations-inside-elasticsearch
- 如果处于nested较底层的数据更新,那么关联的主document均需要更新
- talent -> experience -> company -> industry
- 设计
reference
relation
- https://www.elastic.co/blog/managing-relations-inside-elasticsearch
- 官方推荐的4种处理关系的方式,推荐阅读
- https://www.elastic.co/guide/en/elasticsearch/guide/current/relations.html
- 同样是官方早期的关于关系的处理方式
- https://blog.trifork.com/2016/12/22/handling-a-massive-amount-of-product-variations-with-elasticsearch/
- 对于官方几种方法的测试,最后的结论是,如何进行数据建模随业务使用情况而定,没有一统的解决方案;本文最后是两种方案的结合,但是都偏向于具体业务,不太具备通用性
- https://elastic.blog.csdn.net/article/details/88784748
- https://elastic.blog.csdn.net/article/details/82287045
- https://medium.com/@mena.meseha/briefly-describe-how-to-store-complex-relational-data-in-elasticsearch-f30277317b1
- https://elasticsearch.cn/question/4655
- 没必要设计出三代一起关联查询的场景,纯属复杂化需求
- https://stackoverflow.com/questions/49446401/how-to-make-relationship-in-elasticsearch
denormalize/denormalization
- https://www.geeksforgeeks.org/difference-between-normalization-and-denormalization/
- 介绍了定义,以及rdms的一些例子,比较实用
- https://medium.com/wolox-driving-innovation/why-and-how-denormalize-indexing-with-elasticsearch-rails-6c3c12f03c7c
- Basically, you denormalize only the queries that need to be fast, not the whole database. This is important because in a relational database you model the entire business with entities and relations among them, but in this case, you model queries。文章最重要的观点是,不是把整个表都index,而是需要用到什么才针对性进行index
nested
- https://www.elastic.co/guide/en/elasticsearch/reference/current/nested.html
- https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping.html#mapping-limit-settings
- https://blog.csdn.net/laoyang360/article/details/82950393
- https://www.elastic.co/blog/found-elasticsearch-as-nosql#relations-and-constraints
- In a relational design with proper normalization, you would simply update the product and be done. That's what they are really good at. With a denormalized document database, every order with the product would have to be updated.
- As mentioned in the introduction, Elasticsearch has a concept of "query time" joining with parent/child-relations, and "index time" joining with nested types. We'll probably cover this in more depth in a future article. In the meantime, we can recommend Martijn van Groningen's presentation "Document relations with Elasticsearch".
- 面向文档的机制,就是这样使用的,把相关的数据index到一起,用存储空间交换查询便捷性
- https://discuss.elastic.co/t/deep-nesting-and-recommendation-for-its-usage/185321
- 使用parent/nested来模仿关系都是不合理的,应该使用denormalize来flat数据
- https://blog.gojekengineering.com/elasticsearch-the-trouble-with-nested-documents-e97b33b46194
- nested的查询性能问题,解决方式是对字段进行编码处理,把object[]变成了string[],没有遇到relation的问题
- https://qbox.io/blog/handling-relationships-using-nested-objects-elasticsearch
- Here, the title clause operates on the root document. The nested clause “steps down” into the nested comments field. It no longer has access to fields in the root document, nor fields in any other nested document. The comments.name and comments.age clauses operate on the same nested document. A nested field can contain other nested fields. Similarly, a nested query can contain other nested queries.
join/parent+children
- https://www.elastic.co/guide/en/elasticsearch/reference/current/parent-join.html
- Only one join field mapping is allowed per index.
- Using multiple levels of relations to replicate a relational model is not recommended.
- https://blog.mimacom.com/parent-child-elasticsearch/
- Denormalizing: Flatten your data
- Application-side joins: Run multiple queries on normalised data
- Nested objects: Store arrays of objects
- Parent-child relationships: Store multiple documents through joins
- The only case where the parent-child relationship makes sense is if your data contains a one-to-many relationship where one entity significantly outnumbers the other entity. The song (parent) won't change, but the likes (children) for that song may grow steadily.
- If you would use a nested object for the above use case, the update is expensive. Updating a nested object requires a complete reindexing of the root object and a complete reindexing of all its nested objects!
- https://medium.com/@mena.meseha/understand-the-parent-child-relationship-in-elasticsearch-3c9a5a57f202
- 老的parent+children的方式,type已经废弃了
type
- https://www.elastic.co/guide/en/elasticsearch/reference/master/removal-of-types.html
- 之前的type已经废弃,官方建议迁移type有两种方式
- 使用一个type字段,标识是哪种类型的doc
- 把之前的type,分别index
- 之前的type已经废弃,官方建议迁移type有两种方式
es vs solr join
- https://stackoverflow.com/questions/50343864/how-to-use-join-at-a-same-type-or-index-in-elasticsearch-like-solr
- https://sematext.com/blog/solr-vs-elasticsearch-differences/
- https://logz.io/blog/solr-vs-elasticsearch/
- https://stackoverflow.com/questions/31900285/solr-vs-elasticsearch-for-nested-documents
- solr和es都有join的操作,但是es的has_child/has_parent的api和solr不同
- es的nested比较好用,solr没有类似的功能