lazyrabbit's blog

一、安装elasticsearch

官网安装教程：https://www.elastic.co/guide/en/elasticsearch/reference/7.x/getting-started-install.html

1、Centos

curl -L -O https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.9.1-x86_64.rpm
sudo rpm -i elasticsearch-7.9.1-x86_64.rpm
sudo service elasticsearch start

2、Windows

1）Download the Elasticsearch 7.9.1 Windows zip file from the Elasticsearch download page.(https://www.elastic.co/cn/downloads/elasticsearch)

2）Extract the contents of the zip file to a directory on your computer, for example, C:\Program Files.

3）Open a command prompt as an Administrator and navigate to the directory that contains the extracted files, for example:

cd C:\Program Files\elasticsearch-7.9.1

4）Start Elasticsearch:

bin\elasticsearch.bat

5）Make sure elasticsearch is up and running:

curl http://127.0.0.1:9200

二、ES结构

1、Node 与 Cluster

Elastic 本质上是一个分布式数据库，允许多台服务器协同工作，每台服务器可以运行多个 Elastic 实例。

单个 Elastic 实例称为一个节点（node）。一组节点构成一个集群（cluster）。

2、Index

Elastic 会索引所有字段，经过处理后写入一个反向索引（Inverted Index）。查找数据的时候，直接查找该索引。

所以，Elastic 数据管理的顶层单位就叫做 Index（索引）。它是单个数据库的同义词。每个 Index （即数据库）的名字必须是小写。

3、Document

Index 里面单条的记录称为 Document（文档）。许多条 Document 构成了一个 Index。

Document 使用 JSON 格式表示。

4、Type

Document 可以分组，比如weather这个 Index 里面，可以按城市分组（北京和上海），也可以按气候分组（晴天和雨天）。这种分组就叫做 Type，它是虚拟的逻辑分组，用来过滤 Document。

7.x 版不推荐使用类型（在7.x版本中，_doc作为虚拟类型被使用）：

在Elasticsearch索引中，不同映射类型中具有相同名称的字段在内部由相同的Lucene字段支持。
存储在同一索引中具有很少或没有相同字段的不同实体会导致数据稀疏并影响Lucene压缩文档的效率。
有利于在全文搜索中进行评分
可以更好的根据数据量进行分片

详见：

https://www.elastic.co/guide/en/elasticsearch/reference/7.x/removal-of-types.html

三、ES数据类型

常用的格式如下：

binary encoded as a Base64 string
boolean true and false values
keyword 关键字，用于结构化内容，例如ID，电子邮件地址，主机名，状态代码，邮政编码或标签
constant_keyword 始终包含相同值的关键字字段的
long
double
date 日期
date_nanos 以纳秒级存储日期
object A JSON object
text 用于索引全文值的字段
Arrays 在Elasticsearch中，没有专用的数组数据类型。默认情况下，任何字段都可以包含零个或多个值，但是，数组中的所有值必须具有相同的数据类型

其他详见：https://www.elastic.co/guide/en/elasticsearch/reference/7.x/mapping-types.html

元数据类型：https://www.elastic.co/guide/en/elasticsearch/reference/7.x/mapping-fields.html

四、REST APIS

官网：https://www.elastic.co/guide/en/elasticsearch/reference/7.x/rest-apis.html

1、索引

1）创建一个新索引

PUT /<index>

请求体中可以包含配置、字段映射以及别名，如下

PUT /test
{
  "settings": {
    "number_of_shards": 1
  },
  "mappings": {
    "properties": {
      "field1": { "type": "text" }
    }
  }
}

mapping配置详见 :

https://www.elastic.co/guide/en/elasticsearch/reference/7.x/mapping.html

2）其他操作

# 是否存在索引
HEAD /<index>

# 删除
DELETE /<index>

# 获取索引的信息
# <target>（必需，字符串）用于限制请求的数据流，索引和索引别名的逗号分隔列表。*支持通配符（）。
# 要定位集群中的所有数据流和索引，请忽略此参数或使用 _all或*。
GET /<target>

# 关闭索引
POST /<index>/_close
# 开放索引
POST /<target>/_open

# 克隆
# 要克隆索引，该索引必须标记为只读，并且集群运行状况为green
POST /<index>/_clone/<target-index>
PUT /<index>/_clone/<target-index>

# 阻止索引写操作
PUT /<index>/_settings
{
  "settings": {
    "index.blocks.write": true
  }
}

# 冻结/解冻
POST /<index>/_freeze
POST /<index>/_unfreeze

# 修改索引映射
PUT /<target>/_mapping
{
  "properties": {
    "field1":  { "type": "text"}
  }
}
# 其中请求体中properties为必须参数。
# 例：修改字段的名称
PUT /my-index-000001/_mapping
{
  "properties": {
    "user_id": {
      "type": "alias",
      "path": "user_identifier"
    }
  }
}

# 获取索引映射信息
# 获取所有
GET /<target>/_mapping
# 获取具体字段
GET /<target>/_mapping/field/<field>

# 索引映射是否存在
HEAD /<index>/mapping/<type>


# 修改配置
PUT /<index>/_settings
{
  "index" : {
    "number_of_replicas" : 2
  }
}

# 获取配置
GET /<target>/_settings
GET /<target>/_settings/<setting>

# 使用分析器
GET /_analyze
{
  "analyzer" : "standard",
  "text" : "Quick Brown Foxes!"
}

# 定义模板
# 索引模板定义可以通过index_patterns自动应用于新索引的设置，映射和别名
PUT /_index_template/template_1
{
  "index_patterns" : ["te*"],
  "priority" : 1,
  "template": {
    "settings" : {
      "number_of_shards" : 2
    }
  }
}

# 状态管理
# 清除缓存
POST /<target>/_cache/clear

# 刷新
POST <target>/_refresh
GET <target>/_refresh

# flush
POST /<target>/_flush
GET /<target>/_flush

2、文档

1）添加

# ID required
# 不能重复添加
POST /<target>/_create/<_id>
PUT /<target>/_create/<_id>
# 自动生成ID
POST /<target>/_doc/


# ID required
# ID不存在添加，已存在则更新
PUT /<target>/_doc/<_id>
POST /<target>/_doc/<_id>

2）获取

# 整个document
GET <index>/_doc/<_id>
# source部分
GET <index>/_source/<_id>

# 是否存在
HEAD <index>/_doc/<_id>
HEAD <index>/_source/<_id>

3）获取多个

通过ID检索多个JSON文档

GET /_mget
GET /<index>/_mget

例：获取id

GET /_mget
{
  "docs": [
    {
      "_index": "my-index-000001",
      "_id": "1"
    },
    {
      "_index": "my-index-000001",
      "_id": "2"
    }
  ]
}

4）删除

DELETE /<index>/_doc/<_id>

5）批量删除

POST /<target>/_delete_by_query

例：

POST /my-index-000001/_delete_by_query
{
  "query": {
    "match": {
      "user.id": "elkbee"
    }
  }
}

6）更新

POST /<index>/_update/<_id>

7）批量更新

POST /<target>/_update_by_query

例：

POST my-index-000001/_update_by_query?conflicts=proceed
{
  "query": { 
    "term": {
      "user.id": "kimchy"
    }
  }
}

8）bulk

在单个API调用中执行多个索引编制或删除操作。这样可以减少开销，并可以大大提高索引速度。

POST /_bulk
POST /<target>/_bulk

在请求正文中使用换行符分隔的JSON结构指定操作：

action_and_meta_data\n
optional_source\n
action_and_meta_data\n
optional_source\n
....
action_and_meta_data\n
optional_source\n

action可以是index, create, delete, update。

index和create都可以添加文档，区别在于若ID已存在则create会报错

delete则不需要在下一行的位置指定操作的optional_source

数据的最后一行必须以换行符\ n结尾。每个换行符前面都可以有一个回车符\ r。将请求发送到_bulk端点时，Content-Type标头应设置为application / x-ndjson。

例：

POST _bulk
{ "index" : { "_index" : "test", "_id" : "1" } }
{ "field1" : "value1" }
{ "delete" : { "_index" : "test", "_id" : "2" } }
{ "create" : { "_index" : "test", "_id" : "3" } }
{ "field1" : "value3" }
{ "update" : {"_id" : "1", "_index" : "test"} }
{ "doc" : {"field2" : "value2"} }

9）reindex

复制索引中文档

POST /_reindex

例：

POST _reindex
{
  "source": {
    "index": "my-index-000001"
  },
  "dest": {
    "index": "my-new-index-000001"
  }
}

ElasticSearch 安装以及基本操作