[colug-432] Any kafka users on the list ?

Chris Embree cembree at ez-as.net
Fri Nov 21 21:42:32 EST 2014


8-o   So many questions.

List is fine, nothing secret here, just my opinions which are now
worth slightly less than you paid for them.... sorry.

We run a limited size cluster due to physical limits.  That said, it's
anywhere from 10 -14 kafka nodes.  Each w/ 2 dedicated 10k disks.  GC
hasn't shown up as an issue so far, but it might be the culprit behind
a couple of anomalous issues.

Generally, as a cluster is seems somewhat immature.  It works well
when it works well, otherwise things get ugly.

It can use JMX for monitoring, but management tools are somewhat
limited.  One of smart guys on my team built a tool, dubbed Kurator
(play on ES tools) that uses some Python Kafka API's to provide some
insight.  However, it relies heavily on Zookeeper status and Kafka
telling the truth.  We've seen a few issues that raise doubt about
Kafka's agreement w/ ZK on what's real.

HOWEVER:  Our use case is extremely abusive.  We're looking for 1.2M
1K transactions per second.  If you are anywhere south of 100K/s
chances are extremely good you can construct a highly reliable Kafka
Cluster.

On the fence:  We've had little luck re-allocating partitions to
recover from a lost node.  Listing Kafka topics will show you the #
and nodes hosting In Sync Replicas (ISR).    The re-balance feature is
somewhat new and lightly documented, at least at last Google.  I've
had little luck re-syncing after a node loss.

Kafka is a minor part of our solution in the grand scheme of things.
I feel ill equipped to give a talk on the subject in any reasonable
depth.

That said, I will be speaking at Cisco Live Milan (EMEA).  I'd be
happy to re-present at a COLUG if there is any value.  The talk is a 4
hour tech session (I'm only 1 of 3 speakers) on the entire openSOC
project.  My focus will be on the platform side and it may not be the
best fit for a LUG.  It's more of a HUG topic.

FWIW, I absolutely hate the name Hadoop. ;)

I hope that helps.

Chris

On 11/21/14, Tom Hanlon <tom at functionalmedia.com> wrote:
> Chris,
>
> Can you talk about it on the list ? If not maybe we can send some
> private emails.
>
> How big is the kafka cluster ? How many events are handled?
>
> What are the details of the hiccups ? Java Garbage collection?
> Configuration changes ? General strangeness ?
>
> Does it provide any hooks for monitoring or managing? Nagios for
> monitoring ? Some api hooks for management ?
>
> Thanks,
> Tom
>
>
> On Fri, Nov 21, 2014 at 1:17 PM, Chris Embree <cembree at ez-as.net> wrote:
>> Sadly, yes.
>>
>> We're using Kafka as the buffering queue for OpenSOC (getopensoc.com)
>> and while it works well when things are fine, it has significant
>> difficulty recovering from hiccups.
>>
>> Also, there are few tools for managing it from an Admin point of view.
>> Deleting a topic is a non-trivial task, for example.
>>
>> Chris
>>
>> On 11/21/14, Tom Hanlon <tom at functionalmedia.com> wrote:
>>> Colug,
>>>
>>> Are there any kafka users on this list.
>>>
>>> http://kafka.apache.org/
>>>
>>> I am looking to dive into kafka and some use-case, war-story,
>>> discussion with a user would be helpful.
>>>
>>> If there is broader interest perhaps we can make a meeting
>>> presentation out of it.
>>>
>>> Thanks,
>>>
>>> Tom
>>> _______________________________________________
>>> colug-432 mailing list
>>> colug-432 at colug.net
>>> http://lists.colug.net/mailman/listinfo/colug-432
>>>
>> _______________________________________________
>> colug-432 mailing list
>> colug-432 at colug.net
>> http://lists.colug.net/mailman/listinfo/colug-432
> _______________________________________________
> colug-432 mailing list
> colug-432 at colug.net
> http://lists.colug.net/mailman/listinfo/colug-432
>


More information about the colug-432 mailing list