探究 logstash grok 文本解析
4 min read · April 7th 2017
试错环境
文本的解析规则的编写相对来说还是挺复杂的,我们需要搭建一个调试环境,以便我们试错。
配置 logstash 从标准终端输入中读取数据,grok 解析后,rubydebug 编码后打印到标准终端。
对应配置文件 logstash.conf 如下
input {
stdin {
}
}
filter {
grok {
match => {
"message" => '%{DATA}'
}
}
}
output {
stdout { codec => rubydebug }
}
Logstash 进行了分层设计,input 模块进行日志接受,filters 模块进行日志处理,output 模块进行日志转发,此外还提供了 codecs 模块可以对输入输出信息进行编码解码。各层在配置中也存在着对应。
rubydebug 用来输出结构化数据。
启动 logstash
logstash -f logstash.conf
logstash 成功启动后,终端中输入
hello
logstash 将打印
{
"@timestamp" => 2017-04-08T03:08:10.999Z,
"@version" => "1",
"host" => "612edb7645b3",
"message" => "hello",
"tags" => []
}
整个试错环境运行成功
解析规则
正则规则
grok 使用的是 Oniguruma 正则 来解析文本
命名正则
命名正则顾名思义就是定义了名字的正则规则
logstash 内置了一些命名正则
定义
DATA .*?
给规则.*?
命名 DATA。
命名正则用法
%{SYNTAX:SEMANTIC:TYPE}
# 例子
# %{INT:count:int}
# %{Number:rate:float}
# %{DATA:upstream}
# %{DATA}
- SYNTAX: 命名正则
- SEMANTIC: 通过正则解析出来的结构化数据对应的键值,省略时表示仅匹配但不做解析
- TYPE: 通过正则解析出来的结构化数据对应的类型,省略时表示类型为字符串,当前支持的可选值为 int 和 float
实例
修改logstash.conf
中的 message 字段
"message" => '%{INT:count:int}'
重启 logstash 后,终端输入3
可以看到输出
{
"@timestamp" => 2017-04-08T03:39:40.205Z,
"@version" => "1",
"host" => "612edb7645b3",
"count" => 3,
"message" => "3",
"tags" => []
}
输出中多了一个字段count
, 其值为3
。grok 解析并添加字段 count, 其值为整数 3
直接使用正则
除了使用命名正则外,我们也可以直接使用正则
"message" => '[0-9]+'
如果需要提取数据到结构化对象中,可以使用如下规则
(?<SEMANTIC>RULE)
# e.g
(?<count>[0-9]+)
配置 logstash.conf
"message" => '(?<count>[0-9]+)'
重启后输入3
logstash 打印
{
"@timestamp" => 2017-04-08T04:17:33.670Z,
"@version" => "1",
"host" => "612edb7645b3",
"count" => "3",
"message" => "3",
"tags" => []
}
实战
下面进行 nginx 错误日志解析
参考日志数据
2017/04/08 00:48:07 [error] 24175#24175: *573179 open() "/www/README.html" failed (2: No such file or directory), client: 220.181.51.93, server: www.baicaiyun.cn, request: "GET /README.html HTTP/1.1", host: "www.example.com"
2017/04/07 18:09:33 [error] 24174#24174: *571806 connect() failed (111: Connection refused) while connecting to upstream, client: 113.99.120.208, server: www.example.com, request: "GET /api/user/3332432/info HTTP/1.1", upstream: "http://10.46.122.16:8081/v1/ordersheet/3332432/info", host: "www.example.com", referrer: "http://www.example.com/user"
分析上面数据
- 时间:
2017/04/07 18:09:33
, 需要结构化timestamp => 2017/04/07 18:09:33
- 错误级别:
[error]
, 需要结构化severity => error
- 进程信息:
24175#24175
, 不需要结构化 - 错误信息:
*573179 open() "/www/README.html" failed (2: No such file or directory),
需要结构化errormessage => *573179 open() "/www/README.html" failed (2: No such file or directory)
- 客户端 IP:
client: 220.181.51.93,
, 需要结构化clientip => 220.181.51.93
- 服务名:
server: www.example.com
, 需要结构化server => www.example.com
- 上行『可选』:
upstream: "http://10.46.122.16:8081/v1/ordersheet/3332432/info",
, 不需要结构化 - 访问域名:
host: "www.example.com"
, 需要结构化hostname => www.example.com
- Referrer『可选』:
, referrer: "http://www.example.com/user"
, 需要结构化referrer => http://www.example.com/use
编写正则规则
"message" => '(?<timestamp>%{YEAR}/%{MONTHNUM}/%{MONTHDAY} %{TIME}) \[%{LOGLEVEL:severity}\] %{POSINT}#%{NUMBER}: %{GREEDYDATA:errormessage}, client: %{IP:clientip}, server: %{DATA}, request: \"%{WORD:verb} %{GREEDYDATA:request} HTTP/%{NUMBER:httpversion}\"(?:, upstream: \"%{DATA}\")?, host: \"%{HOSTNAME:hostname}\"(?:, referrer: \"%{DATA:referrer}\")?'
第一条数据解析结果
{
"severity" => "error",
"errormessage" => "*573179 open() \"/www/README.html\" failed (2: No such file or directory)",
"request" => "/README.html",
"verb" => "GET",
"message" => "2017/04/08 00:48:07 [error] 24175#24175: *573179 open() \"/www/README.html\" failed (2: No such file or directory), client: 220.181.51.93, server: www.baicaiyun.cn, request: \"GET /README.html HTTP/1.1\", host: \"www.example.com\"",
"tags" => [],
"hostname" => "www.example.com",
"@timestamp" => 2017-04-08T04:41:52.336Z,
"clientip" => "220.181.51.93",
"@version" => "1",
"host" => "612edb7645b3",
"httpversion" => "1.1",
"timestamp" => "2017/04/08 00:48:07"
}
第二条数据解析结果
{
"severity" => "error",
"errormessage" => "*571806 connect() failed (111: Connection refused) while connecting to upstream",
"request" => "/api/user/3332432/info",
"verb" => "GET",
"message" => "2017/04/07 18:09:33 [error] 24174#24174: *571806 connect() failed (111: Connection refused) while connecting to upstream, client: 113.99.120.208, server: www.example.com, request: \"GET /api/user/3332432/info HTTP/1.1\", upstream: \"http://10.46.122.16:8081/v1/ordersheet/3332432/info\", host: \"www.example.com\", referrer: \"http://www.example.com/user\"",
"tags" => [],
"referrer" => "http://www.example.com/user",
"hostname" => "www.example.com",
"@timestamp" => 2017-04-08T04:48:36.250Z,
"clientip" => "113.99.120.208",
"@version" => "1",
"host" => "612edb7645b3",
"httpversion" => "1.1",
"timestamp" => "2017/04/07 18:09:33"
}