fluentd 大数据采集平台

管理员

260
文章

304
评论

17/10/2017 01:34:28大数据已关闭评论11,508字数 1179阅读3分55秒阅读模式

在大数据采集平台中，有这样一个特殊的平台，它的各部分均是可定制化的，你可以通过简单的配置，将日志收集到任何地方。这是一个目前非常火热的大数据采集平台，被众多企业所应用，所以本期就为大家介绍这个特殊的平台——Fluentd。

fluentd 文章源自运维生存时间-https://www.ttlsa.com/bigdata/fluentd-big-data-collection-platform/

什么是Fluentd？文章源自运维生存时间-https://www.ttlsa.com/bigdata/fluentd-big-data-collection-platform/

Fluentd是一个开源的数据收集器，专为处理数据流设计，有点像 syslogd ，但是使用JSON作为数据格式。它采用了插件式的架构，具有高可扩展性高可用性，同时还实现了高可靠的信息转发。文章源自运维生存时间-https://www.ttlsa.com/bigdata/fluentd-big-data-collection-platform/

据分(Y)析(Y)，Fluentd是由Fluent+d得来，d生动形象地标明了它是以一个守护进程的方式运行。官网上将其描述为data collector，在使用上，我们可以把各种不同来源的信息，首先发送给Fluentd，接着Fluentd根据配置通过不同的插件把信息转发到不同的地方，比如文件、SaaS Platform、数据库，甚至可以转发到另一个Fluentd。文章源自运维生存时间-https://www.ttlsa.com/bigdata/fluentd-big-data-collection-platform/

官网：http://docs.fluentd.org/articles/quickstart文章源自运维生存时间-https://www.ttlsa.com/bigdata/fluentd-big-data-collection-platform/

Fluentd的作用文章源自运维生存时间-https://www.ttlsa.com/bigdata/fluentd-big-data-collection-platform/

官网给出的两张张图能让你很直观的了解Fluentd的作用。文章源自运维生存时间-https://www.ttlsa.com/bigdata/fluentd-big-data-collection-platform/

使用Fluentd前，日志系统的状态：文章源自运维生存时间-https://www.ttlsa.com/bigdata/fluentd-big-data-collection-platform/

fluentd 文章源自运维生存时间-https://www.ttlsa.com/bigdata/fluentd-big-data-collection-platform/

使用Fluentd后，日志系统的状态：文章源自运维生存时间-https://www.ttlsa.com/bigdata/fluentd-big-data-collection-platform/

fluentd 文章源自运维生存时间-https://www.ttlsa.com/bigdata/fluentd-big-data-collection-platform/

机制图解：文章源自运维生存时间-https://www.ttlsa.com/bigdata/fluentd-big-data-collection-platform/

fluentd 文章源自运维生存时间-https://www.ttlsa.com/bigdata/fluentd-big-data-collection-platform/

Fluentd的功能文章源自运维生存时间-https://www.ttlsa.com/bigdata/fluentd-big-data-collection-platform/

1）安装方便文章源自运维生存时间-https://www.ttlsa.com/bigdata/fluentd-big-data-collection-platform/

2）占用空间小文章源自运维生存时间-https://www.ttlsa.com/bigdata/fluentd-big-data-collection-platform/

3）半结构化数据日志记录文章源自运维生存时间-https://www.ttlsa.com/bigdata/fluentd-big-data-collection-platform/

4）灵活的插件机制文章源自运维生存时间-https://www.ttlsa.com/bigdata/fluentd-big-data-collection-platform/

5）可靠的缓冲文章源自运维生存时间-https://www.ttlsa.com/bigdata/fluentd-big-data-collection-platform/

6）日志转发文章源自运维生存时间-https://www.ttlsa.com/bigdata/fluentd-big-data-collection-platform/

Fluentd的部署和构架

Fluentd的部署和Flume非常相似：

fluentd

Fluentd的架构设计和Flume如出一辙：

fluentd

Fluentd的Input／Buffer／Output非常类似于Flume的Source／Channel／Sink。

Input

Input负责接收数据或者主动抓取数据。支持syslog，http，file tail等。

Buffer

Buffer负责数据获取的性能和可靠性，也有文件或内存等不同类型的Buffer可以配置。

Output

Output负责输出数据到目的地例如文件，AWS S3或者其它的Fluentd。

Fluentd的技术栈

fluentd

Fluentd的结构

由于其简单的结构，Fluentd的核心只包含3000行Ruby。Fluentd收集各种输入源的事件并将它们写入输出接收器。 eg：输入源：HTTP, Syslog, Apache Log输出源：Files, Mail, RDBMS databases, NoSQL storages

下图显示了输入和输出的基本思想:

fluentd

FLuentd 的扩展性非常好，客户可以自己定制（Ruby）Input／Buffer／Output。 Fluentd从各方面看都很像Flume，区别是使用Ruby开发，Footprint会小一些，但是也带来了跨平台的问题，并不能支持Windows 平台。另外采用JSON统一数据／日志格式是它的另一个特点。相对于Flumed，配置也相对简单一些。

我的微信

微信公众号

扫一扫关注运维生存时间公众号，获取最新技术文章~

fluentd 大数据采集平台

B站日志系统

fluentd收集日志文件