原创

Kafka集群部署与配置

开始安装配置Kafaka

1. 编辑server.properties文件


############################# Server Basics #############################

# The id of the broker. This must be set to a unique integer for each broker.

broker.id=1

############################# Socket Server Settings #############################




# The number of threads that the server uses for receiving requests from the network and sending responses to the network

num.network.threads=3

# The number of threads that the server uses for processing requests, which may include disk I/O

num.io.threads=8

# The send buffer (SO_SNDBUF) used by the socket server

socket.send.buffer.bytes=102400

# The receive buffer (SO_RCVBUF) used by the socket server

socket.receive.buffer.bytes=102400

# The maximum size of a request that the socket server will accept (protection against OOM)
socket.request.max.bytes=104857600


############################# Log Basics #############################

# A comma separated list of directories under which to store log files
log.dirs=/home/hadoop/log/kafka-logs

# The default number of log partitions per topic. More partitions allow greater
# parallelism for consumption, but this will also result in more files across
# the brokers.

num.partitions=4

# The number of threads per data directory to be used for log recovery at startup and flushing at shutdown.
# This value is recommended to be increased for installations with data dirs located in RAID array.
num.recovery.threads.per.data.dir=1

############################# Internal Topic Settings  #############################
# The replication factor for the group metadata internal topics "__consumer_offsets" and "__transaction_state"
# For anything other than development testing, a value greater than 1 is recommended for to ensure availability such as 3.
offsets.topic.replication.factor=1
transaction.state.log.replication.factor=1
transaction.state.log.min.isr=1

############################# Log Flush Policy #############################

# Messages are immediately written to the filesystem but by default we only fsync() to sync
# the OS cache lazily. The following configurations control the flush of data to disk.
# There are a few important trade-offs here:
#    1. Durability: Unflushed data may be lost if you are not using replication.
#    2. Latency: Very large flush intervals may lead to latency spikes when the flush does occur as there will be a lot of data to flush.
#    3. Throughput: The flush is generally the most expensive operation, and a small flush interval may lead to excessive seeks.
# The settings below allow one to configure the flush policy to flush data after a period of time or
# every N messages (or both). This can be done globally and overridden on a per-topic basis.

# The number of messages to accept before forcing a flush of data to disk
#log.flush.interval.messages=10000

# The maximum amount of time a message can sit in a log before we force a flush
#log.flush.interval.ms=1000

############################# Log Retention Policy #############################

# The following configurations control the disposal of log segments. The policy can
# be set to delete segments after a period of time, or after a given size has accumulated.
# A segment will be deleted whenever *either* of these criteria are met. Deletion always happens
# from the end of the log.

# The minimum age of a log file to be eligible for deletion due to age
log.retention.hours=168

# A size-based retention policy for logs. Segments are pruned from the log unless the remaining
# segments drop below log.retention.bytes. Functions independently of log.retention.hours.
#log.retention.bytes=1073741824

# The maximum size of a log segment file. When this size is reached a new log segment will be created.
log.segment.bytes=1073741824

# The interval at which log segments are checked to see if they can be deleted according
# to the retention policies
log.retention.check.interval.ms=300000

############################# Zookeeper #############################

# Zookeeper connection string (see zookeeper docs for details).
# This is a comma separated host:port pairs, each corresponding to a zk
# server. e.g. "127.0.0.1:3000,127.0.0.1:3001,127.0.0.1:3002".
# You can also append an optional chroot string to the urls to specify the
# root directory for all kafka znodes.
zookeeper.connect=node3:2181,node4:2181,node5:2181

# Timeout in ms for connecting to zookeeper
zookeeper.connection.timeout.ms=6000


############################# Group Coordinator Settings #############################

# The following configuration specifies the time, in milliseconds, that the GroupCoordinator will delay the initial consumer rebalance.
# The rebalance will be further delayed by the value of group.initial.rebalance.delay.ms as new members join the group, up to a maximum of max.poll.interval.ms.
# The default value for this is 3 seconds.
# We override this to 0 here as it makes for a better out-of-the-box experience for development and testing.
# However, in production environments the default value of 3 seconds is more suitable as this will help to avoid unnecessary, and potentially expensive, rebalances during application startup.
group.initial.rebalance.delay.ms=0
port=9092
message.max.byte=5242880
default.replication.factor=2
replica.fetch.max.bytes=5242880
delete.topic.enable=true
  • server.properties增加的部分、备注:
port=9092
message.max.byte=5242880
default.replication.factor=2
replica.fetch.max.bytes=5242880
delete.topic.enable=true
1. 默认端口号为9092,如果是在一台服务器上部署3个kafka的话,可以配置该属性可以用于端口号修改,分服务器部署则不用加该配置
2. 1024*1024*5 = 5M ( (默认:1000000) – broker能接收消息的最大字节数,这个值应该比消费端的fetch.message.max.bytes更小才对,否则broker就会因为消费端无法使用这个消息而挂起)
3. kafka的replica指的是消息的备份,默认值为1,表示不对topic进行备份。如果配置为2,表示除了leader节点,对于topic里的每一个partition,都会有一个额外的备份。
4. (消费端的最大字节)
5. topic是否可以被删除
  • server.properties配置备注
//当前机器在集群中的唯一标识,和zookeeper的myid性质一样
broker.id=0
//当前kafka对外提供服务的端口默认是9092
port=9092
//这个参数默认是关闭的,在0.8.1有个bug,DNS解析问题,失败率的问题。
host.name=hadoop1
//这个是borker进行网络处理的线程数
num.network.threads=3
//这个是borker进行I/O处理的线程数
num.io.threads=8
//发送缓冲区buffer大小,数据不是一下子就发送的,先回存储到缓冲区了到达一定的大小后在发送,能提高性能
socket.send.buffer.bytes=102400
//kafka接收缓冲区大小,当数据到达一定大小后在序列化到磁盘
socket.receive.buffer.bytes=102400
//这个参数是向kafka请求消息或者向kafka发送消息的请请求的最大数,这个值不能超过java的堆栈大小
socket.request.max.bytes=104857600
//消息存放的目录,这个目录可以配置为“,”逗号分割的表达式,上面的num.io.threads要大于这个目录的个数这个目录,
//如果配置多个目录,新创建的topic他把消息持久化的地方是,当前以逗号分割的目录中,那个分区数最少就放那一个
log.dirs=/home/hadoop/log/kafka-logs
//默认的分区数,一个topic默认1个分区数
num.partitions=1
//每个数据目录用来日志恢复的线程数目
num.recovery.threads.per.data.dir=1
//默认消息的最大持久化时间,168小时,7天
log.retention.hours=168
//这个参数是:因为kafka的消息是以追加的形式落地到文件,当超过这个值的时候,kafka会新起一个文件
log.segment.bytes=1073741824
//每隔300000毫秒去检查上面配置的log失效时间
log.retention.check.interval.ms=300000
//是否启用log压缩,一般不用启用,启用的话可以提高性能
log.cleaner.enable=false
//设置zookeeper的连接端口
zookeeper.connect=192.168.123.102:2181,192.168.123.103:2181,192.168.123.104:2181
//设置zookeeper的连接超时时间
zookeeper.connection.timeout.ms=6000

2. 创建Kafaka日志目录

mkdir -p /home/hadoop/log/kafka-logs

3. 分发Kafaka到node3、node4、node5节点

scp -r kafka_2.11-2.1.0/ hadoop@node5:/home/hadoop/

4. 修改其余节点上的broker.id

node3->1 node4->2 node5->3

5. 配置kafaka的环境变量

export KAFKA_HOME=/home/hadoop/kafka_2.11-2.1.0
export PATH=$PATH:$KAFKA_HOME/bin

6. 启动kafka

cd /home/hadoop/kafka_2.11-2.1.0/

kafaka启动前需要保证各个节点的zookeeper是启动的

  • 启动成功的截图

kafka

7. 后台启动Kafaka

cd /home/hadoop/kafka_2.11-2.1.0/

bin/kafka-server-start.sh -daemon config/server.properties
  • jps查看kafka启动状态

kafka-daemon

8. 停止kafka服务

bin/kafka-server-stop.sh config/server.properties

9. 测试

  • 创建kafka主题和生产者写入消息
[hadoop@node3 ~]$ kafka-topics.sh --create --zookeeper node3:2181,node4:2181,node5:2181 --replication-factor 3 --partitions 4 --topic topic2         
Created topic "topic2".
[hadoop@node3 ~]$ kafka-topics.sh --describe --zookeeper node3:2181 --topic topic2   
Topic:topic2    PartitionCount:4        ReplicationFactor:3     Configs:
        Topic: topic2   Partition: 0    Leader: 1       Replicas: 1,2,3 Isr: 1,2,3
        Topic: topic2   Partition: 1    Leader: 2       Replicas: 2,3,1 Isr: 2,3,1
        Topic: topic2   Partition: 2    Leader: 3       Replicas: 3,1,2 Isr: 3,1,2
        Topic: topic2   Partition: 3    Leader: 1       Replicas: 1,3,2 Isr: 1,3,2
[hadoop@node3 ~]$ kafka-console-producer.sh --broker-list node3:9092 -topic topic2   
>hello java
>hello kafka
>
  • kafka消费者查看消息
[hadoop@node4 kafka_2.11-2.1.0]$ kafka-console-consumer.sh --bootstrap-server  node3:9092  --from-beginning --topic topic2   
hello java
hello kafka

小节

以上便是kafaka安装部署的一个过程,如果文章中有内容错误的,欢迎大家及时指正。

正文到此结束