Flume入门

摘要:　flume简单入门

配置文件：
修改命令：vi conf/flume.conf

### define agent  
a1.sources = r1  
a1.sinks = k1  
a1.channels = c1  
  
### define sources   
a1.sources.r1.type = netcat  
a1.sources.r1.bind = hadoop-senior.ibeifeng.com  
a1.sources.r1.port = 44444  
  
### define channels  
a1.channels.c1.type = memory  
al.channels.c1.capacity = 1000  
al.channels.c1.transactionCapacity = 1000  
  
### define sink  
al.sinks.k1.type = logger  
  
#### bind the source and sink to the channel  
a1.sources.r1.channels = c1  
al.sinks.k1.channel = c1  

启动flume agent：　　

bin/flume-ng agent -c conf -f conf/flume.conf --name a1 -Dflume.root.logger=INFO,console 

执行：

bin/flume-ng avro-client --conf conf -H localhost -p 41414 -F /etc/passwd -Dflume.root.logger=DEBUG,console

常见的source　　

avro source　　　 avro可以监听和收集指定端口的日志，使用avro的source需要说明被监听的主机ip和端口号，下面给出一个具体的例子：

a1.sources = r1

a1.channels = c1

a1.sources.r1.type = avro

a1.sources.r1.channels = c1

a1.sources.r1.bind = 0.0.0.0

a1.sources.r1.port = 4141

exec source exec可以通过指定的操作对日志进行读取，使用exec时需要指定shell命令，对日志进行读取，下面给出一个具体的例子：

a1.sources = r1

a1.channels = c1

a1.sources.r1.type = exec

a1.sources.r1.command = tail -F /var/log/secure

a1.sources.r1.channels = c1

spooling-directory source spo_dir可以读取文件夹里的日志，使用时指定一个文件夹，可以读取该文件夹中的所有文件，需要注意的是该文件夹中的文件在读取过程中不能修改，同时文件名也不能修改。下面给出一个具体的例子：

agent-1.channels = ch-1

agent-1.sources = src-1

agent-1.sources.src-1.type = spooldir

agent-1.sources.src-1.channels = ch-1

agent-1.sources.src-1.spoolDir = /var/log/apache/flumeSpool

agent-1.sources.src-1.fileHeader = true

syslog source

syslog可以通过syslog协议读取系统日志，分为tcp和udp两种，使用时需指定ip和端口，下面给出一个udp的例子：

a1.sources = r1

a1.channels = c1

a1.sources.r1.type = syslogudp

a1.sources.r1.port = 5140

a1.sources.r1.host = localhost

a1.sources.r1.channels = c1

常见的channel

Flume的channel种类并不多，最常用的是memory channel，下面给出例子：

a1.channels = c1

a1.channels.c1.type = memory

a1.channels.c1.capacity = 10000

a1.channels.c1.transactionCapacity = 10000

a1.channels.c1.byteCapacityBufferPercentage = 20

a1.channels.c1.byteCapacity = 800000

常见的sink

logger sink

logger顾名思义，就是将收集到的日志写到flume的log中，是个十分简单但非常实用的sink

avro sink
avro可以将接受到的日志发送到指定端口，供级联agent的下一跳收集和接受日志，使用时需要指定目的ip和端口：例子如下：

a1.channels = c1

a1.sinks = k1

a1.sinks.k1.type = avro

a1.sinks.k1.channel = c1

a1.sinks.k1.hostname = 10.10.10.10

a1.sinks.k1.port = 4545

file roll sink
file_roll可以将一定时间内收集到的日志写到一个指定的文件中，具体过程为用户指定一个文件夹和一个周期，然后启动agent，这时该文件夹会产生一个文件将该周期内收集到的日志全部写进该文件内，直到下一个周期再次产生一个新文件继续写入，以此类推，周而复始。下面给出一个具体的例子：

a1.channels = c1

a1.sinks = k1

a1.sinks.k1.type = file_roll

a1.sinks.k1.channel = c1

a1.sinks.k1.sink.directory = /var/log/flume

hdfs sink

hdfs与file roll有些类似，都是将收集到的日志写入到新创建的文件中保存起来，但区别是file roll的文件存储路径为系统的本地路径，而hdfs的存储路径为分布式的文件系统hdfs的路径，同时hdfs创建新文件的周期可以是时间，也可以是文件的大小，还可以是采集日志的条数。具体实例如下：

a1.channels = c1

a1.sinks = k1

a1.sinks.k1.type = hdfs

a1.sinks.k1.channel = c1

a1.sinks.k1.hdfs.path = /flume/events/%y-%m-%d/%H%M/%S

a1.sinks.k1.hdfs.filePrefix = events-

a1.sinks.k1.hdfs.round = true

a1.sinks.k1.hdfs.roundValue = 10

a1.sinks.k1.hdfs.roundUnit = minute

hbase sink hbase是一种数据库，可以储存日志，使用时需要指定存储日志的表名和列族名，然后agent就可以将收集到的日志逐条插入到数据库中。例子如下：

a1.channels = c1

a1.sinks = k1

a1.sinks.k1.type = hbase

a1.sinks.k1.table = foo_table

a1.sinks.k1.columnFamily = bar_cf

a1.sinks.k1.serializer = org.apache.flume.sink.hbase.RegexHbaseEventSerializer

a1.sinks.k1.channel = c1

JSONHandler 配置agent:

a1.sources = r1  
a1.sinks = k1  
a1.channels = c1  
# Describe/configure the source  
a1.sources.r1.type = org.apache.flume.source.http.HTTPSource  
a1.sources.r1.port = 8888  
a1.sources.r1.channels = c1  
# Describe the sink  
a1.sinks.k1.type = logger  
# Use a channel which buffers events in memory  
a1.channels.c1.type = memory  
a1.channels.c1.capacity = 1000  
a1.channels.c1.transactionCapacity = 100  
# Bind the source and sink to the channel  
a1.sources.r1.channels = c1  
a1.sinks.k1.channel = c1 

Derek

Flume入门

常见的source

常见的channel

常见的sink

Search

Table of Contents