CommunityOverCode - The ASF Conference Asia 2023 阿帕奇软件基金会亚洲大会

免费报名中

27667 人关注

时间 2023-08-18 09:30 ~ 08-20 18:00

地址北京海淀区丽亭华苑酒店

活动由 The ASF 主办

免费报名中

27667 人关注

微信分享

使用微信扫一扫分享到朋友圈

活动分享

使用微信扫一扫进入小程序分享活动

活动详情

CommunityOverCode (原 ApacheCon) 是 Apache 软件基金会（ASF）的官方全球系列大会。自 1998 年以来--在 ASF 成立之前 -- ApacheCon 已经吸引了各个层次的参与者，在 300 多个 Apache 项目及其不同的社区中探索 "明天的技术"。CommunityOverCode 通过动手实作、主题演讲、实际案例研究、培训、黑客松活动等方式，展示 Apache 项目的最新发展和新兴创新。

CommunityOverCode 展示了无处不在的 Apache 项目的最新突破和 Apache 孵化器中即将到来的创新，以及开源开发和以 Apache 之道领导社区驱动的项目。与会者可以了解到独立于商业利益、企业偏见或推销话术之外的核心开源技术。

CommunityOverCode 项目是动态的，每次活动的内容都是由精选的 Apache 项目开发者和用户社区直接推动的。CommunityOverCode 提供了最先进的内容，在一个协作、厂商中立的环境中，展示了大数据、云计算、社区发展、金融科技、物联网、机器学习、消息中间件、编程、搜索、安全、服务器、流媒体、网络框架等方面的最新开源进展。

本次会议将以线下的方式于 2023 年 8 月 18 日至 8 月 20 日在北京丽亭华苑酒店举行，并进行线上直播，欢迎各位嘉宾的光临。

活动门票

活动筹备中

售票推广中

活动结束

选择票

门票名称

单价(￥)

截止时间

数量

单日普通票

￥299

2023-08-20 18:00

已售罄

门票包含：
8 月 18 日至 20 日任意一天的 Keynote 演讲
8 月 18 日至 20 日任意一天的技术分论坛
参会日当日的工作午餐
大会纪念礼包

学生/Committer 单日普通票

仅邀请

￥199

2023-10-08 18:00

已结束

此门票需要主办方审核

需审核，Committer 需在购票时提供 Apache Committer ID，学生需在报名时提供 edu 邮箱，并凭有效期内的学生证签到入场

门票只能通过使用邀请码获得

三日普通票

￥599

2023-08-20 18:00

已售罄

门票包含：
8 月 18 日至 20 日三日的 Keynote 演讲
8 月 18 日至 20 日三日的技术分论坛
8 月 18 日至 20 日三日的工作午餐
大会纪念礼包

学生/Committer 三日普通票

￥399

2023-08-20 18:00

已售罄

此门票需要主办方审核

需审核，Committer 需在购票时提供 Apache Committer ID，学生需在报名时提供 edu 邮箱，并凭有效期内的学生证签到入场

个人赞助门票（单日）

￥999

2023-08-20 18:00

已售罄

门票包含：
8 月 18 日至 20 日任意一天的 Keynote 演讲
8 月 18 日至 20 日任意一天的技术分论坛
参会日当日的工作午餐
大会纪念礼包
8 月 18 日讲师欢迎晚宴门票

个人赞助门票（三日）

￥1,299

2023-08-20 18:00

已售罄

门票包含：
8 月 18 日至 20 日三日的 Keynote 演讲
8 月 18 日至 20 日三日的技术分论坛
8 月 18 日至 20 日三日的工作午餐
大会纪念礼包
8 月 18 日讲师欢迎晚宴门票

Community Leadership Workshop 培训套票

￥6,999

2023-08-20 18:00

已售罄

此门票需要主办方审核

门票包含：
8 月 17 日会前的 Community Leadership Workshop 闭门培训
8 月 18 日至 20 日三日的 Keynote 演讲
8 月 18 日至 20 日三日的技术分论坛
8 月 18 日至 20 日三日的工作午餐
大会纪念礼包
讲师欢迎晚宴门票

SPEAKER

仅邀请

免费

2023-08-20 18:00

已售罄

此门票需要主办方审核

门票只能通过使用邀请码获得

SPONSOR

仅邀请

免费

2023-08-20 18:00

已售罄

此门票需要主办方审核

门票只能通过使用邀请码获得

赞助商工作人员

仅邀请

免费

2023-08-20 18:00

已售罄

此门票需要主办方审核

门票只能通过使用邀请码获得

优惠或邀请码

使用优惠或邀请码

票价

￥ 0

活动已结束

论坛主席

姜宁

Keynote / 性能工程

谭中意

Keynote / 人工智能 / 机器学习

活动日程

按时间

全部

2023-08-18

2023-08-19

2023-08-20

按会场

全部

鸿运厅1

鸿运厅2

鸿运厅3

金辉厅1

金辉厅2

金辉厅3

金辉厅 5

天鸿厅3

主会场

按专题

全部

主题演讲

人工智能 / 机器学习

API / 微服务

数据存储与计算

云原生

开源社区

性能优化

孵化器

物联网 / 工业物联网

消息系统

远程过程调用 / 服务网格

流处理

Web 服务器/ Tomcat

数据湖与数据仓库

DataOps

OLAP & Data Analysis

通用

重复演讲

2023-08-18

09:30 - 12:00

Keynote 主题演讲

主会场

2023-08-18

09:30 - 09:35

大会组委会欢迎致辞

2023-08-18

09:35 - 09:40

陆首群和David多位大咖为大会致辞

2023-08-18

09:40 - 10:10

The ASF: Past and Future

Since 1999 when The Foundation was established, the open source landscape has changed in many ways but the founding principles remain. The ASF continues to operate as a charity to serve the public interest. The projects, under the direction of the Project Management Committees, are the primary governing bodies, subject to oversight by the Board of Directors.

Over the past few years, both internal and external events have required changes to the way the ASF operates:
Governments have recognized that open source software presents new security challenges to the way the internet works;
Privacy concerns require changes to the ASF approach to transparency;
The ASF needs to recognize that new communications products and protocols change the way communities interact, both within and external to them.

嘉宾

Craig Russell

Apache Software Foundation Apache 软件基金会董事

2023-08-18

10:10 - 10:30

Introduction to Apache Doris 2.0

如果说在过去 Apache Doris 更多是服务于高性能在线实时分析场景的话， 2022年底发布的 1.2 版本无疑标志着 Apache Doris 能力边界得到进一步拓展，越来越多用户开始基于 Apache Doris 构建高效的实时数据分析服务，而最近发布的 2.0 版本更是全面强化了 Apache Doris 在半结构化数据分析、混合工作负载以及数据湖联邦分析等场景下等场景下的能力。在本次的分享中，我将会为大家揭秘 Apache Doris 2.0 版本的最新重磅特性。同时结合过去几年里社区研发方向的思考，将会分享后续社区的重要发展方向以及版本迭代的详细计划。

嘉宾

马如悦

飞轮科技 Apache Doris 项目创始人 & 飞轮科技 CEO

2023-08-18

10:30 - 11:00

人间清醒：开源的最深层次动机

通常而言，人们认为开源是一个利他行为，但我们知道，所有的利他都有着利己的基础，那么开源开发者的心理动机到底是怎样的？他们是如何考虑自身的利益的？是什么让他们在没有任何实际利益回报的情况下，仍然不断地输出和贡献？对于企业而言，又是怎么回事？企业难道不应该更注重实际利益吗。本演讲试图拨开这些迷雾，展示一个掩藏在利他表象之下的最真实也是最能让人理解的开源动机。

嘉宾

卫剑钒

N/A 《大教堂与集市》中文译者，国际信息系统安全认证专家（CISSP），中国金融学会金融科技专委会委员

2023-08-18

11:00 - 11:20

一路前行，阿里云大数据从拥抱开源走向引领开源

阿里巴巴自 2009 年开始采用 Apache Hadoop 技术进行大数据分析，2010 年第一次将 Apache HBase 技术在商品搜索中大规模投产，2016 年将处于萌芽状态的 Apache Flink 在双 11 实时推荐场景落地，并在同年阿里云上发布支持 Apache Hadoop/Hive/Spark/Kafka 等主流开源大数据技术的 E-MapReduce 云产品。在最近几年，阿里云开源大数据 Flink 团队作为 Apache Flink 最主要的贡献者推动 Flink 成为全球流计算事实标准，并向 ASF 捐赠了 Apache Celeborn 和 Apache Paimon 开源大数据项目，本议题将介绍阿里云大数据如何一步步从拥抱、贡献开源走向开源社区的引领者。

嘉宾

王峰

阿里巴巴阿里巴巴花名“莫问”，在阿里云任研究员职位，开源大数据平台负责人

2023-08-18

11:20 - 12:00

开源的未来：挑战与机遇

在我们的数字世界中，开源软件已经成为了像路桥一样的基础设施的一部分，发挥着越来越大的作用。然而，随着开源生态系统的发展，我们也面临着诸多挑战。开源软件供应链安全，开源的可持续发展，以及如何处理好开源与商业之间的关系，已成为开源世界急需需要解决的问题。在这次圆桌讨论中，我们将与 Apache软件基金会的资深人士一起，探讨开源世界面临的挑战以及可能的解决方案。

嘉宾

Craig Russell

Apache Software Foundation Apache 软件基金会董事

Rich Bowen

AWS Apache 软件基金会董事，AWS 开源战略师

Justin Mclean

Datastrato Apache 软件基金会董事

姜宁

大会议题评审组成员 | Apache 软件基金会董事

大花

Answer Answer 社区经理

2023-08-18

13:30 - 17:15

OLAP & Data Analysis

金辉厅 5

2023-08-18

13:30 - 14:00

Apache ECharts 的图表服务端渲染方案

With over 20 chart types, Apache ECharts' package size can be as much as 1MB, causing prolonged loading times on mobile devices without 4G connectivity. Discover how Server-Side Rendering (SSR) resolves this issue, while also exploring the implementation of SVG animation techniques that enable users to interact with rendered images. Learn innovative methods to optimize performance and create engaging, interactive visualizations that captivate your audience while keeping package sizes minimal. Don't miss this opportunity to unlock the full potential of Apache ECharts on mobile devices and revolutionize your data visualization practices.

嘉宾

Ovilia

Apache ECharts Apache ECharts PMC Chair

2023-08-18

14:00 - 14:30

日志存储分析的数仓化

从ElasticSearch到Apache Doris，构建十倍性价比的新一代日志系统 1. 日志系统的典型应用场景和特点 2. 基于ES的日志系统典型架构和面临的挑战，包括对schema free的支持不够、分析能力较弱、写入和存储成本很高 3. 基于Doris的日志系统典型架构和优势，包括对schema free的原生支持、基于SQL引擎的强大分析能力、支持倒排索引的基础上性价比提升十倍

嘉宾

肖康

SelectDB SelectDB 技术副总裁

2023-08-18

14:30 - 15:00

Apache Arrow DataFusion: 向量化查询引擎揭秘

Apache Arrow DataFusion是一个快速的、可扩展的、向量化执行框架，使用Arrow作为其内存数据格式，使用Rust语言进行实现。 DataFusion提供了多种层次的扩展接口：用户在实现数据库或者查询系统的时候，可以轻松地将DataFusion集成进去，从而利用其极致的性能，避免重复实现查询引擎的问题。本次介绍主要包含： 1. DataFusion是什么以及其发展历史 2. DataFusion的架构 3. DataFusion提供了哪些扩展能力（udf，logical plan，execution plan/node等) 4. DataFusion使用的场景是什么 5. DataFusion当前有哪些使用案例

嘉宾

刘昆

eBay eBay大数据工程师，Apache Arrow Committer & PMC Member

2023-08-18

15:00 - 15:30

Apache Impala 4.2 & 4.3 版本新特性一览

Apache Impala是一个基于MPP架构实现的分布式查询引擎。本次演讲将分享Impala社区的最新进展，包括已经发布的4.2的核心功能，以及将要在4.3中发布的新功能

嘉宾

黄权隆

Cloudera Impala PMC Member & Committer，Cloudera 研发工程师

2023-08-18

15:30 - 15:45

茶歇

2023-08-18

15:45 - 16:15

Apache Doris 在衔远科技的应用实践

随着实时数据分析需求的不断增长，实时数据仓库在衔远科技内部承担着越来越重要的分析诉求。经过多个版本的迭代，正式确定了以Apache Doris为核心的实时数据仓库架构。在本次的分享中，我将会为大家分享 Apache Doris在衔远科技的实践经验，以及在实际业务场景中如何为我们降本增效。

嘉宾

王永臣

北京衔远科技北京衔远科技大数据开发工程师

2023-08-18

16:15 - 16:45

字节跳动大数据 SQL 权限精细化管理实践

背景：近年来，数据安全问题逐步受到各国政府和企业的重视，随着国家数据安全法、个人信息保护法的相继颁布和实施，对于数据最小够用原则也提出了明确的要求。因此，如何更细粒度管控权限也变成了每个企业都必须解决的问题。当前问题：业界通常基于规则对SQL中的权限点进行提取，将这些权限点横向按照行维度，或者纵向按照列维度进行管控。这种单一维度的权限管控粒度过粗，无法支持多条权限间的组合关系。在字节跳动这种多业务线数据统一存储的中台大宽表场景下，现有方案难以满足业务侧对数据权限细粒度管控的需求。解决方案：基于上述问题，火山引擎LAS研发团队基于ByteQuery查询引擎及自研权限服务Gemini设计了一套行列混合权限的精细化管理方案。 - 基于SQL血缘的精准权限提取 - 基于血缘能力，精准定位SQL中真正使用到的权限点信息(表，行，列等)，进行精细化权限提取。 - 行列混合权限多维度精细管控 - 在传统的库权限，表权限，列权限之上，新增加了一种行限制权限，行权限可以作为一种特殊的资源附属在表权限/列权限上面。 - 每一个表权限/列权限可以同时捆绑多个行权限资源，不同表权限/列权限的行限制相互独立。 - 通过横向/纵向权限点的捆绑组合，将查询资源定位到行列重叠的'资源单元格'上，达到更细粒度的资源级别权限方案优势：在新的方案下，通过精准细粒度的权限点提取，以及多维度的行列混合权限支持，将资源管控由横向的某一行，或者纵向的某一列，细化到行列重叠的'资源单元格'上。进一步细化了权限管控范围，在保证用户正常使用的前提下，最小粒度的授予所需权限。具体典型案例和实现原理将会在演讲PPT中进行介绍。

嘉宾

朱江

字节跳动火山引擎 LAS 高级研发工程师

2023-08-18

16:45 - 17:15

基于 Apache Calcite 的多引擎指标管理最佳实践

数据分析中有着各种各样的指标，在维护海量指标的时候，常常有如下的痛点： - 重复片段无法得到复用。 - 不同引擎需要编写不同的SQL。 - 口径变更难以同步到所有下游。为了解决这些问题，字节跳动尝试过用已有的技术能力设计方案： - 将指标尽可能地存储到 Hive 表中：会极大增大存储成本和回溯成本，不太可行。 - 将指标封装到View中：不仅会在Hive产生额外的表信息导致表数量翻倍，而且对分区的支持不友好。查询使用体验较差，因此难以推广。因为目前已有的技术不足以解决上述问题，所以字节跳动基于Apache Calcite设计并实现了两套新的语法能力： - 虚拟列：列级别的视图，复用表列权限，推广简单。 - SQL Define Function：使用SQL直接定义函数，方便SQL片段的复用。这两项能力结合，可以有效降低指标管理的成本例如： - 指标仅需修改一次，无须下游再同步修改。 - MAP、JSON等集合类型中的字段可以定义成虚拟列，逻辑更加清晰、使用更加方便。具体典型案例和实现原理将会在演讲PPT中进行介绍。

嘉宾

谢佳君

字节跳动火山引擎 LAS 高级研发工程师，Calcite Committer

2023-08-18

13:30 - 17:15

数据存储与计算

金辉厅3

2023-08-18

13:30 - 14:00

What's new in the recent and upcoming HBase releases

Apache HBase™ is the Hadoop database, a distributed, scalable, big data store. The HBase community is preparing new major release 3.0.0 and new minor release 2.6.0, with some brand new features. In this presentation, we will introduce these new features, about how they benefit our users and how we implement them in HBase： 1. Tracing Improvements: OpenTelemetry integration 2. TLS Support: secure and encrypted rpc communication 3. Cloud Native Support: Better OSS support, k8s deployment, etc. 4. Other Notable Improvements: HBase on ozone, new region replication framework, etc. Additionally, we will delve into our plans for the future and discuss the exciting directions in which HBase is heading.

嘉宾

张铎

神策数据神策数据首席架构师，Apache HBase PMC Chair

2023-08-18

14:00 - 14:30

Deep dive into resource manageability in ozone storage

Organizations need to manage resources allocated and used by different entities within it. In the context of Apache Ozone, resources are storage space and namespace (count of files, keys and directories). Apache Ozone provides capability to define, and control resource usages by specifying quota. Ozone provides ways to manage resources different from the hadoop system. The paper will present the resource management capabilities, behavior with respect to multiple ozone features such as trash, snapshot, and comparison differences with the hadoop system.

嘉宾

Sumit Agrawal

Cloudera Senior Staff Engineer, Cloudera

2023-08-18

14:30 - 15:00

Spark SQL Shuffle Join Improvement at eBay

Join operation is one of the most important and widely used operations in data warehouse. The Join operator in Apache Spark is one of the most expensive operators, especially Shuffle Join. In this presentation, we will introduce a series of Shuffle Join optimizations recently added at eBay. Specifically, 1. Unwrap cast in join condition to use bucket join. 2. Enhance shuffle exchange reuse to reduce table scans. 3. Push down partial aggregation through Join.

嘉宾

王玉明

eBay eBay 软件工程师，Apache Spark PMC

2023-08-18

15:00 - 15:30

字节跳动千亿文件 HDFS 集群实践

随着大数据技术的深入发展，数据规模和使用复杂度越来越高，Apache HDFS 面临着新的挑战。在字节跳动，HDFS 既是传统 Hadoop 数仓业务的存储，也是存算分离架构计算引擎的底座，还是机器学习模型训练的存储底座。字节跳动大数据存储团队基于 HDFS 本身，搭建了服务于大规模计算资源调度跨多地区的存储调度能力提升计算任务稳定性；也提供了统合用户侧缓存、常规三副本、冷存的数据识别和冷热调度能力。本次分享介绍字节跳动如何认识新兴场景对传统大数据存储的新要求，并分享技术和运维体系演进来支持不同应用场景。

嘉宾

熊睦

字节跳动基础架构工程师

2023-08-18

15:30 - 15:45

茶歇

2023-08-18

15:45 - 16:15

Apache Kyuubi & Celeborn(Incubating): 助力 Spark 拥抱云原生

在过去数年中，网易在大数据云原生领域进行了长足的探索。本次演讲围绕如何基于 Apache Kyuubi & Celeborn 等开源技术，构建企业级 Spark on Kubernetes 云原生离线计算平台展开，包含技术选型、架构设计、经验教训、缺陷改进、降本增效等内容，深入剖析网易在该领域的探索成果。

嘉宾

潘成

网易，Apache Kyuubi & Celeborn 社区网易数帆软件工程师，Apache Kyuubi PMC，Apache Celeborn PPMC

2023-08-18

16:15 - 16:45

Resilient Data: Exploring Replication and Recovery in Apache Ozone

Data resilience is crucial in modern distributed systems to ensure data availability and durability. Apache Ozone, a scalable and distributed object store that has the capability to handle billions of objects, addresses the need for resilient data storage through its replication and recovery mechanisms. This talk delves into the concepts and techniques employed by Apache Ozone to achieve high data resilience. The first part of the talk explores data replication in Apache Ozone. It discusses how Ozone maintains strong consistency by keeping consistent copies of blocks across all nodes . The second part, which is the crux of the talk, deals with data backup and recovery. It will discuss how one can use effective backup strategies like cross-cluster replication, Ozone snapshots, etc. This talk serves as a comprehensive guide for exploring the resilience aspects of Apache Ozone, enabling practitioners to leverage its capabilities effectively and make informed decisions when designing data-intensive applications.

嘉宾

Sadanand Shenoy

Cloudera Software Engineer II , Cloudera

2023-08-18

16:45 - 17:15

Linkis 在理想汽车的应用实践

Apache Linkis 是在上层应用程序和底层引擎之间构建的一层计算中间件。本次分享的内容主要包括：为何我们选择 Linkis 作为理想汽车内部的中间件；在 Linkis 的落地实践过程中，我们添加和修复了哪些功能。以及这些功能如何让我们能够更好地满足开发需求，提高工作效率；我们在实践中遇到的一些挑战和问题以及我们所采取的解决方案和建议；计划添加的新功能和改进。希望通过本次分享为正在使用和计划使用Linkis作为中间件的团队提供一些经验。

嘉宾

郗世豪

理想汽车理想汽车高级大数据工程师

2023-08-18

13:30 - 16:45

开源社区

鸿运厅3

2023-08-18

13:30 - 14:00

OpenDAL 的开发者体验分享

本次分享主要围绕开源项目 OpenDAL 的开发者体验实践展开，分享 OpenDAL 项目的实践，并根据实际经验做出自己的总结。

嘉宾

丁皓（Xuanwo）

Databend Databend 研发工程师

2023-08-18

14:00 - 14:30

Apache Kvrocks 社区演进

- Apache 孵化器对社区的帮助和影响 - 社区发展现状 - 社区如何获得第一位社区 PPMC - 社区如何获得第一位海外 Committer - 孵化一年的总结

嘉宾

王源

百度百度资深工程师

2023-08-18

14:30 - 15:00

社区和贡献者如何找到彼此？

《社区和贡献者如何找到彼此？》 Where is the community? Who is contributor? 看起来是两个显而易见的问题，但却是在开源社区最常见的两个问题。在这个演讲中，我们将深入探讨开源社区的本质以及贡献者的角色。从三人成众，到志同道合。从来了就是朋友，到有效激励贡献者的长期贡献。总之，我们将展示社区和贡献者如何相互发现对方，以及他们如何建立并维持长久的关系。我们会探讨一些成功的策略，包括如何吸引新的贡献者，如何为他们提供支持，以及如何帮助他们发展成为项目的领导者。无论你是新手，还是经验丰富的开源贡献者，我们都希望你能从这个演讲中获得有价值的洞察和灵感。

嘉宾

庄表伟

开源社开源社理事

2023-08-18

15:00 - 15:30

自研分布式数据库的开源之路

OceanBase 历经10多年的发展, 从最早淘宝收藏夹使用的电商数据库, 一路成长, 开始慢慢成为蚂蚁集团所有核心系统的数据库, 再成长成为众多企业尤其是金融行业的核心系统的数据库数据库承担着数据的存储和管理职责, 是应用系统的核心保障, 牵一发以动全身, 而阿里和蚂蚁每年双11, 又会带来海量的流量, 这些流量, 让所有的系统变得错综复杂, 而数据库又承担着稳定性和数据准确性要求, 就让一个分布式数据库的稳定性和数据准确性成为一个世界级难题。面对这样一个世界级难题, OceanBase 沉淀了12年的架构, 研发和运维经验在2021年6月正式开源。本次分享将以开源原生分布式数据库 OceanBase 实践为例，分享从0到1搭建健康的开源社区，包括企业级开源项目从商业化到开源的决策思考，开源项目由内向外的生态探索和健康活跃开源社区的建设之路。

嘉宾

段少婷

OceanBase OceanBase社区经理

2023-08-18

15:30 - 15:45

茶歇

2023-08-18

15:45 - 16:15

AI时代与智能组织：从Apache得到的启示

1.Apache的故事与组织模式 2.人工智能时代与超级个体 3.新型组织模式：以Midjourney和LAION

嘉宾

张雅琪（Alphatu）

北京奇点汇科技有限公司 ChaosAI 创始人

2023-08-18

16:15 - 16:45

从0到“20000+”用户，Apache DolphinScheduler社区如何实现双向奔赴？

在过去的三年中，Apache DolphinScheduler 取得了惊人的30倍增长，这离不开其强大的技术优势和卓越的社区运营。本次演讲将探讨 Apache DolphinScheduler 是如何通过创新的社区运营、高效的技术支持和密切的用户互动来实现与用户的双向奔赴，以及这一成功经验对其他开源项目的启示。让我们共同揭开 Apache DolphinScheduler 快速发展背后的奥秘，探索其持续增长的动力来源。

嘉宾

曾辉

白鲸开源白鲸开源高级社区经理

2023-08-18

13:30 - 16:45

数据湖与数据仓库

金辉厅1

主持人

代立冬

2023-08-18

13:30 - 14:00

Challenges and Solutions on building Realtime Data warehousing with Apache Flink , Apache Hive and Apache Iceberg

There are many technologies that can be used to build an Enterprise level real-time data warehouse. In order to fully migrate the Batch ETL processing of your EDW towards Real Time ETL, there are challenges such as late events, dirty data routing, etc require extra attention to handle. The purpose of this speech is to provide the recent community works on Apache Flink, Apache Hive, and Apache Iceberg and architecture design related to migrating Batch Processing EDW to Real-time PRocessing EDW. 在当下，有很多的技术组合可以用于迁移基于批处理的数据仓库至实时处理的数据仓库。为了能够完整的迁移批处理的数据仓库，我们需要额外的处理在实时架构下遇到的迟到事件，脏数据路由以及由这些问题引起的结果集修复等问题。本演讲主要关注在Apache Flink ， Apache Hive 和 Apache Iceberg在围绕上述挑战下的社区工作总结以及如何使用Apache Flink， Apache Hive 和 Apache Iceberg 构建一个企业级的实时数据仓库。

嘉宾

Yan Liu 刘岩

Cloudera Cloudera Solution Eng

2023-08-18

14:00 - 14:30

基于 Flink 构建实时数据湖的实践

实时数据湖是现代数据架构的核心组成部分，它允许企业实时分析和查询大量数据。在这场分享中，我们将首先介绍实时数据湖目前存在的痛点，比如数据的高时效性，多样性，一致性和准确性等。然后介绍我们如何基于 Flink 和 Iceberg 构建实时数据湖，主要通过如下两部分展开：如何将数据实时入湖、如何使用 Flink 进行 OLAP 临时查询。最后介绍一下字节跳动在实时数据湖中的一些实践收益。

嘉宾

王正

字节跳动火山引擎云原生计算研发工程师

2023-08-18

14:30 - 15:00

OpenEuler and Bigtop with Ambari : Empower Data Lake in the real world

At present, there are no available free data lake platforms to streamline data management and analytics, as Cloudera Data Platform (CDP) is no longer offered free of charge. As more users show interest in similar platforms, Bigtop with Ambari provide free open-source solutions for the data lake stack as an alternative to CDP that can deliver faster and easier data management and analytics. The Bigtop team, along with people from Oracle, NTT DATA, Visa, Arm and some individual developers, re-incubated Apache Ambari last year. Roman Shaposhnik, the founder of Bigtop and Director of the ASF Board, spearheaded the effort to bring Ambari back. In this talk, we will provide an overview of the new Bigtop 3.2.0 release and an in-depth perspective on the Bigtop+Ambari solution as a data lake platform. Furthermore, Bigtop has started to work on supporting OpenEuler, which has over 300 organizational members and has seen significant enterprise deployments, particularly in China. We will introduce the current work status and roadmap of Bigtop on OpenEuler.

嘉宾

Yuqi Gu

Arm Staff Software Engineer, Arm

2023-08-18

15:00 - 15:30

Apache Linkis 在湖仓一体架构下的数据处理实践

主要分享邮储银行作为一家大型国有银行，在湖仓一体架构下，如何结合Linkis解决实际面临的问题，以及后续的发展建议。邮储银行大数据领域坚持技术自主可控、开放理念，以开源Hadoop+MPP的结合的湖仓一体技术架构，融合批量实时数据处理双链路，搭建开源Hadoop集群上千台，紧跟前沿技术，基于Iceberg数据湖格式，使用Spark（批量）、Flink（实时）计算引擎，基于Apache Linkis构建湖仓一体的技术平台底座，不断提升数据服务质效。在技术实施过程中，也遇到技术组件多，基础环境维护复杂、技术有难度，数据开发技术门槛高、流批一体架构下，组件版本升级快、不同算引擎的元数据存储不同，统一视图难等诸多问题，通过引入Apache Linkis，实现了计算组件的底层对接，对外提供统一接口调用；优先使用sql开发，降低了应用开发门槛；支持同一组件多版本共存，具备了灰度升级能力；以Hive Catalog为主，提供了元数据统一入口。在使用Apache Linkis过程中，邮储银行积极参与Apache linkis社区共建，主要贡献了数据访问层增加Postgresql支持、文件存储层S3支持等PR，未来也将持续加强社区互动，就Iceberg数据湖治理、容器化部署等方面贡献力量。

嘉宾

王华磊

中国邮政储蓄银行中国邮政储蓄银行副主任工程师

2023-08-18

15:30 - 15:45

茶歇

2023-08-18

15:45 - 16:15

数据湖 Iceberg 在小米的实践与优化

本次分享着重于介绍小米内部引入Iceberg的原因和现状，及利用Iceberg实现业务架构升级的实践，也包括了对Iceberg Parquet文件过滤能力的优化，和托管式表优化服务架构演进及落地等内容。

嘉宾

肖杰宝

小米小米软件研发工程师

2023-08-18

16:15 - 16:45

字节跳动基于 Parquet 格式的降本增效实践

字节跳动离线数仓默认使用 Parquet 格式进行数据存储，但是在业务使用过程中我们遇到了小文件过多，数据存储成本高等相关问题。针对小文件过多问题，现有技术方案一般是通过 Spark 读取多个 Parquet 小文件后，再将这些数据重新输出并合并到一个或多个大文件。对于存储成本过大问题目前离线数仓只有分区级的行级 TTL 方案，如果需要删除分区中不再使用且占比较大的明细字段数据（列级 TTL)，则需要通过 Spark 将数据读取出来并将需要删除的字段置为 NULL 的覆写方式来完成。无论是小文件合并，列级 TTL，都存在对 Parquet 数据文件的大量覆写操作。由于 Parquet 格式有特殊的编码规则，需要经过特殊的（反）序列化、（解）压缩、（反）编码等一系列操作，才能实现对 Parquet 中数据的读写。在这一过程中，编解码、解压缩之类的操作是 CPU 密集型计算，会消耗大量计算资源。为了提高 Parquet 格式文件覆写效率，我们深入研究了 Parquet 文件格式定义，采用了二进制 copy 的方法优化数据覆写操作，跳过了普通覆写中编解码之类的多余操作，相比于传统方法大幅提高了文件覆写效率，性能是普通覆写方式的 10+ 倍。为了提高易用性，我们同时提供了新的 SQL 语法来支持用户方便的完成小文件合并、列级 TTL 等操作。

嘉宾

徐庆

字节跳动火山引擎LAS高级研发工程师

王恩策

字节跳动火山引擎 LAS 高级研发工程师

2023-08-18

13:30 - 17:45

人工智能 / 机器学习

金辉厅2

2023-08-18

13:30 - 14:00

生成式AI的分布式缓存：优化云上的LLM数据管道

大型语言模型（LLM）训练是一个资源密集型的过程，需要大量的存储、CPU和GPU资源，以及众多小文件的频繁输入输出。随着LLMs越来越复杂，对高性能、可扩展的数据处理解决方案的需求也在增加，特别是在分布式云训练的背景下。传统的数据平台架构难以维持所需的I/O吞吐量，导致GPU利用不足和资源使用效率低下。在此背景下，专为优化云上LLM数据管道的Alluxio最新分布式缓存架构系统应运而生。 Alluxio与Spark是来自加州大学伯克利分校AMP实验室的姊妹项目。Spark+Alluxio的组合在AI场景下提供了高性能、可扩展和强大的数据处理和分析能力。它可以加速大规模数据处理和机器学习任务，提供快速的数据访问和共享机制，同时优化数据管道和保持数据一致性。这使得AI工作负载能够更高效地处理和分析大规模数据集，从而加速模型训练、推理和决策过程。 1、分布式缓存系统的设计和实现及如何解决LLM训练和推理的I/O挑战 2、探讨数据访问模式的独特要求，以及分享通过云上分布式缓存优化数据管道的最佳实践 3、基于Alluxio+Spark的实现提升效率打造现代化的数据平台 4、实践案例：微软、腾讯和知乎的Alluxio应用 5、探索如何利用可扩展、高效和强大的数据基础设施进行LLM训练和推理

嘉宾

傅正佳

Alluxio Alluxio 开源布道师

胡梦宇

知乎知乎大数据基础架构开发工程师

2023-08-18

14:00 - 14:30

字节跳动 Spark 支持万卡模型推理实践

随着云原生的发展，Kubernetes 由于其强大的生态构建能力和影响力，使得包括大数据、AI 在内越来越多类型的负载应用开始向 Kubernetes 迁移，字节内部探索 Spark 从 Hadoop 迁移到 Kubernetes，使得作业云原生化运行。同时搜索有大量GPU需求量极大的离线批处理任务，随着潮汐任务上量，发现一系列问题： GPU 算力供给(卡时数)仍有较大缺口、单机房资源池规模无法匹配业务单位任务计算量增长、在线资源池算力浪费问题、缺乏统一平台入口。Spark 和 AML(应用机器学习)合作，通过 GPU 共享技术、混部 GPU 调度、Spark引擎增强，平台及周边生态完善等途径，支持万张卡混部 GPU 模型推理离线计算，支持作业80亿多模态训练数据使用混部 GPU 7k卡 7.5h完成模型打分数据清洗，并且资源使用效率、稳定性均得到了显著提升。

嘉宾

刘畅

字节跳动字节跳动基础架构工程师

张永强

字节跳动字节跳动机器学习系统工程师

2023-08-18

14:30 - 15:00

为什么我们需要面向异构计算的编译器体系

随着计算任务的复杂性和数据量的增加，传统的通用计算平台已经无法满足高性能计算的需求。异构计算体系的加速到来，不同的计算平台具有不同的指令集和架构特点。面向异构计算的编译器体系可以提供更高的性能和效率，并且支持不同类型的计算单元和平台之间的无缝集成，从而推动计算技术的发展和创新。

嘉宾

王臣汉

OpenBayes贝式计算 OpenBayes 贝式计算创始人兼 CEO

2023-08-18

15:00 - 15:30

Flink ML 2.2.0 的新特性解析与应用

Flink ML是基于Apache Flink的流批一体的机器学习算法库，是Apache Flink的子项目。在此报告中，我们将介绍Flink ML 2.2.0的新特性与Flink ML在阿里巴巴的落地场景。 - 支持在线推理服务的基础设施 - 丰富的特征工程算法 - 在线学习算法的设计与在实时日志聚类场景的应用

嘉宾

洪帆

阿里巴巴阿里巴巴算法专家

张智鹏

阿里云智能阿里云智能算法专家

2023-08-18

15:30 - 15:45

茶歇

2023-08-18

15:45 - 16:15

Bringing LLM to Everywhere via Machine Learning Compilation

Significant progress has been made in the field of generative artificial intelligence and large language models (LLMs), which possess remarkable capabilities and the potential to fundamentally transform many domains. However, nowadays, LLMs require extensive computation and memory to run and usually run on servers with cloud GPUs. And we introduce MLC-LLM, an open-sourced project based on Apache TVM to run LLMs on PC, Mobile, and even WebGPU with GPU acceleration.

嘉宾

Siyuan Feng

Shanghai Jiao Tong University Ph.D. Student, Shanghai Jiao Tong University

2023-08-18

16:15 - 16:45

字节跳动深度学习批流一体训练实践

随着公司业务发展，算法复杂度不断提升，越来越多的算法模型在离线更新的基础上探索实时训练以提提升模型效果。为实现复杂的离线和实时训练灵活编排、自由切换，能在更大范围内调度在离线计算资源，机器学习模型训练逐渐趋于批流一体化。在字节跳动内部批式训练数据主要基于Apache Iceberg、Apache HDFS等，流式数据主要基于Apache Kafka。在此背景下，我们实践并开源了具有海量多阶段多源数据灵活编排能力，高效训练的批流一体机器学习训练框架。支撑了字节跳动日均1万+作业，500万核心CPU任务，1万卡GPU任务，单任务平均数据量500TB 训练规模。我们将分享包括字节跳动机器学习训练调度框架的架构演进、批流一体实践、异构弹性训练等部分内容。着重介绍在MFTC（批流一体协同训练）场景下，多阶段多数据源混合编排、流式样本全局Shuffle、全链路Native化，训练数据洞察等实践经验。全新的调度架构，实现了更有效的利用机器资源池，统一资源调度入口，更灵活的多角色调度，弹性扩缩容，提高资源利用率。批流一体的混合训练能力可以支持更高的数据消费吞吐，实现灵活的离线数据与实时数据混合编排，同时提供数据优先级保障，数据可视化能力。演讲提纲： 1. 现状与背景 1. 批流一体训练整体架构介绍：IceBerg 及 Kafka 选型优势 2. 字节跳动调度框架演进过程 3. Primus 开源项目介绍 2. 批流一体训练实践 1. 批流一体业务背景、问题与挑战 2. 字节跳动实战经验 1. IceBerg及Kafka多阶段多数据源编排 2. DataLoading技术演进与性能优化 3. All2Allshuffle、Batch/Stream 优先级调度 4. Insight训练数据洞察 3. Primus Flow实践：与Spark结合，实现具有预处理功能的训练

嘉宾

毛洪玥

字节跳动字节跳动基础架构工程师

2023-08-18

16:45 - 17:15

Unifying Real-time and Batch ML Inference using BentoML and Apache Spark

BentoML provides tooling for packaging, deploying, and serving machine learning models at scale. Apache Spark is an open-source cluster computing framework for large-scale data processing. This talk will highlight how BentoML can unify real-time and batch inference workloads by integrating with Apache Spark. BentoML has rapidly gained popularity among its user base owing to its seamless open standards for constructing online AI applications as distributed services through simple Python code. In this regard, we present the novel integration of BentoML with Spark, which allows users to employ the Bento service, originally designed for real-time inference, within a Spark cluster for offline batch inference without altering any code. This functionality is enabled by the run_in_spark API, which automatically propagates the models and inference logic across all Spark worker nodes during batch inference. This integration offers an optimal solution for teams to manage both their real-time and batch inference logic under the same standards, facilitated with version control, and ensuring consistent library dependencies. As a result, this eliminates the concerns regarding divergence in the inference logic over time between real-time and batch inferences. The unified approach ensures consistent model application, fostering efficient AI service development and deployment. Attendees will learn how to:
1. Package models with BentoML;2. Deploy BentoServices to production;3. Invoke BentoServices from Spark for batch inference at scale;4. Leverage the same models for both real-time and batch predictions.

嘉宾

Bo Jiang

BentoML Product Engineer, BentoML

2023-08-18

17:15 - 17:45

Boost ML networks on specific HW platform with Apache TVM on the example of Qualcomm Adreno™ GPU

In our presentation, we will introduce Apache TVM - tensor compiler for boosting execution of ML networks. Apache TVM is a powerful tool for optimizing DL models. On the example of Adreno GPU we will demonstrate how user can significantly improve performance of his model with TVM, by utilizing HW specific of a target platform. In 2021, initial research to enable Adreno textures in Apache TVM was prototyped based on Mobilenet v1 workload and demonstrated promising performance results, meanwhile other networks still worked slowly on Qualcomm Adreno GPU. In 2022, a big amount of optimization work was done, and significant performance boost was achieved. In our talk, we will demonstrate how users can use TVM to run models on their target platforms and what should they do to achieve the best performance.

嘉宾

Egor Churaev

Deelvin Solutions Sr. Software Engineer, Deelvin Solutions

2023-08-18

13:30 - 16:45

流处理

鸿运厅2

主持人

李钰

王鑫

2023-08-18

13:30 - 14:00

Apache Flink 流批自适应 Shuffle

在 2022 年的 Flink Forward Asia 上，我们首次提出了以云原生、流批融合、自适应为核心的 Flink Shuffle 3.0 架构。新的 Shuffle 架构具有以下优势： ‒ 更加适应云原生环境的资源编排与隔离特点 ‒ 兼具传统流式与批式 Shuffle 技术的优势 ‒ 能够根据运行时的资源与负载情况做出自适应调节，更加易用本次分享，我们将介绍 Flink 1.18 版本在这方面取得的最新进展与未来规划。

嘉宾

宋辛童

阿里云阿里云高级技术专家，阿里云 Flink Shuffle & SDK 团队负责人

谭玉新

阿里云阿里云高级开发工程师

2023-08-18

14:00 - 14:30

基于Apache Calcite/Gremlin构建流式图处理系统

典型的流计算主要针对表模型的处理场景，而针对图模型如何进行流式的处理和分析，目前通用流计算还难以支持。本次分享主要介绍蚂蚁自研的流式图引擎GeaFlow,以及GeaFlow如何围绕Apache Calcite和Apache Gremlin构建流式图查询语言的能力。同时也会分享基于流式图计算在蚂蚁内部的实践和应用。

嘉宾

潘臻轩

蚂蚁集团蚂蚁集团资深技术专家

2023-08-18

14:30 - 15:00

联通基于Apache StreamPark的大规模实时计算生产实践

1.大数据实时计算平台支持基于事件的低延迟处理以及流批一体的数据处理，支撑了30+内部和外部组织的实时化业务和10000+的数据服务订阅，每天处理2.3万亿条数据、600TB+数据量，集群规模独享480+服务器，服务了十几条业务生产产品线 2.基于Apache StreamPark一站式的面向实时计算作业的管理平台，支撑了生产环境500+Flink ON YARN实时计算作业管理，通过可视化的简洁的操作流程完成了项目管理、作业管理、团队管理、权限管理、告警管理、日志管理、版本管理、集群管理、资源配置、Flink JAR、Flink SQL、监控大屏等管理功能，实现了实时作业全生命周期管理，帮助团队解决了作业运维泥沼、提升了管理效率、减低了故障率、提高了业务支撑质量，全面实现了实时计算的一体化、平台化的管理

嘉宾

穆纯进

联通数字科技有限公司联通数字科技有限公司大数据实时计算平台研发负责人

2023-08-18

15:00 - 15:30

FlinkSQL的字段血缘及数据权限解决方案

数据血缘和数据安全是搭建企业级数据仓库不可或缺的能力。近年来随着各行各业对大数据实时化的需求越来越强烈，以 Flink 为代表的实时数仓快速兴起，但由于发展时间相对较短，离线数仓领域基于 Apache Ranger 和 Apache Atlas 相对成熟的数据血缘和安全解决方案尚未支持 Flink SQL，且依赖 Ranger 和 Atlas 会导致系统部署和运维过重。因此，如何在对 Flink 和 Calcite 源码零侵入的前提下实现 FlinkSQL 的字段血缘及数据权限管理，就显得尤为重要。本次分享将详细介绍相关解决方案，帮助听众打造 Flink 实时数仓领域的 Atlas+Ranger。

嘉宾

白松

杭州数澜科技有限公司杭州数澜科技有限公司联合创始人，研发中心副总经理

2023-08-18

15:30 - 15:45

茶歇

2023-08-18

15:45 - 16:15

Streaming Apache Kudu within Apache Flink

So far CDC is not supported within Apache Kudu, so there is no way to read data from it in a streaming style like other CDC enabled data sources when integrating with Apache Flink. To overcome this, a Apache Flink source connector has been built to unlock the ability for Apache Kudu to stream the data in a continuous and incremental way. In this speech, we will discuss and share the detailed design and implementation for the solution.

嘉宾

Wei Chen

eBay Staff Software Engineer of eBay

2023-08-18

16:15 - 16:45

Shaping the Future: Unveiling High-Concurrency Streaming Analytics with Apache Druid

"Stream processing is rapidly evolving to meet the high-demand, real-time requirements of today's data-driven world. As organizations seek to leverage the real-time insights offered by streaming data, the need for robust, highly concurrent analytics platforms has never been greater. This presentation introduces Apache Druid, a modern, open-source data store designed for such real-time analytical workloads. Apache Druid's key strength lies in its ability to ingest massive quantities of event data and provide sub-second queries, making it a leading choice for high concurrency streaming analytics. Our exploration will cover the architecture, its underlying principles, tuning principals and the unique features that make it optimal for high concurrency use-cases. We'll dive into real-life applications, demonstrate how Druid addresses the challenge of immediate data visibility, and discuss its role in powering interactive, exploratory analytics on streaming data. Participants will gain an in-depth understanding of Apache Druid’s value in the rapidly evolving landscape of streaming analytics and will be equipped with the knowledge to harness its power in their own data-intensive environments. Join us as we delve into the future of real-time analytics, discovering how to 'Shaping the Future: Unveiling High-Concurrency Streaming Analytics with Apache Druid'.

嘉宾

Tijo Thomas

Imply Data inc Lead Solutions Architect

2023-08-18

13:30 - 17:15

消息系统

鸿运厅1

主持人

王殿进

2023-08-18

13:30 - 14:00

Apache Pulsar 3.0：首个LTS版本及其新特性

Apache Pulsar 社区最近推出了 Apache Pulsar 3.0，这是 Pulsar 的第一个 LTS 版本。在本次演讲中，我们将深入探讨Pulsar LTS 版本的重要性。我们还将介绍 Pulsar 3.0 中引入的主要特性，包括新的负载均衡器、大规模延迟消息的支持以及Direct IO 优化等。

嘉宾

Zike Yang

StreamNative Software Engineer of StreamNative

2023-08-18

14:00 - 14:45

Apache Pulsar 限流功能在移动云云原生场景下的应用实践

移动云为了实现 Pulsar 在云原生容器化场景下的多租户网络资源隔离，针对 Pulsar 的限流功能、ResourceGroup 以及负载均衡等特性做了大量优化。本次演讲将介绍 Pulsar 的限流实现原理，我们如何优化 ResourceGroup 来实现集群级别的限流，以及在集群级别限流场景下如何优化负载均衡策略。

嘉宾

王嘉凌

中国移动云能力中心中国移动云能力中心软件开发工程师

2023-08-18

14:45 - 15:30

华为终端云在容器场景中对Apache Pulsar的优化实践

Apache Pulsar是一款云原生消息队列，基于其存算分离架构，通常可以在业务低峰期缩容计算层来节省资源。我们在容器化场景下，针对Apache Pulsar做了大量优化。如：现在Pulsar负载均衡算法依赖于节点过去的负载数据，达到平衡的过程比较缓慢。当开启HPA，节点在负载均衡的过程中很可能又会触发扩容，而扩容又会引发新的负载均衡。我们要如何优化来让Pulsar更加云原生？

嘉宾

林琳

华为华为 SDE 专家

2023-08-18

15:30 - 15:45

茶歇

2023-08-18

15:45 - 16:30

Kafka without Zookeeper

Currently, Kafka relies on ZooKeeper to store its metadata, ex: brokers info, topics, partitions...etc. KRaft is a new generation of Kafka that runs without Zookeeper. This talk will include: 1. Why Kafka needs to develop the new KRaft feature. 2. The architectures of the old (with Zookeeper) Kafka and new (without Zookeeper) Kafka 3. Benefit of adopting KRaft 4. How it works internally. 5. The monitoring metrics 6. Tools to help troubleshoot issues in KRaft 7. A demo to show what we've achieved so far. 8. The roadmap for the Kafka community to move toward KRaft. After this talk, the audience can have better knowledge of what KRaft is, and how it works, and what's the difference with Zookeeper based Kafka, and most importantly, how to monitor it and troubleshoot it.

嘉宾

Luke Chen

RedHat Senior Software Engineer, RedHat

邓子明

字节跳动字节跳动数据开发

2023-08-18

16:30 - 17:15

Deep Dive the replication protocol in Kafka

Being a messaging system, the data durability is very important. The replication ensures automatic failover to other replicas when a server in the cluster fails so messages remain available in the presence of failures. In Apache Kafka, the replication protocol is not only used to achieve durability, but also to achieve high throughput. In this talk, we'll deep dive how the replication protocol works internally in Kafka. We'll also explain what's the pros and cons of this kind of design. Furthermore, we'll also introduce the other kind of replication protocol in Kafka, which is used for KRaft controllers (i.e. quorum based way). After this talk, the audience can rethink these replication protocols, and maybe some of the ideas can be brought into some other distributed system projects. Hope it will also help audience know more about Apache Kafka.

嘉宾

Luke Chen

RedHat Senior Software Engineer, RedHat

大会组委会欢迎致辞

陆首群和David多位大咖为大会致辞

The ASF: Past and Future

Craig Russell

Apache Software Foundation Apache 软件基金会董事

18年前，Craig Russell作为一个贡献者加入了Apache软件基金会，为开发DB JDO项目做出了贡献。2007年，他被选为成员。目前，他担任Apache软件基金会的助理秘书，并担任董事会成员。

Introduction to Apache Doris 2.0

马如悦

飞轮科技 Apache Doris 项目创始人 & 飞轮科技 CEO

前百度杰出架构师，先后担任过百度分布式计算团队、大数据工程团队和 AI 产品工程团队的技术负责人。 2013 年领导设计和开发了实时数仓 Doris 并在以后一直担任其总负责人， 2023 年起担任飞轮科技 CEO。

人间清醒：开源的最深层次动机

卫剑钒

N/A 《大教堂与集市》中文译者，国际信息系统安全认证专家（CISSP），中国金融学会金融科技专委会委员

卫剑钒，开源圣经《大教堂与集市》中文版译者，国际信息系统安全认证专家（CISSP），中国金融学会金融科技专委会委员，长期从事开源技术、网络安全、区块链技术、金融科技等领域研究和实践，著有《安全协议分析与设计》、《大教堂与集市》（译）、《Web3：互联网的新世界》、《区块链在中国》等书。

一路前行，阿里云大数据从拥抱开源走向引领开源

王峰

阿里巴巴阿里巴巴花名“莫问”，在阿里云任研究员职位，开源大数据平台负责人

自 2010 年开始从事开源大数据技术研发和管理工作，目前带领团队打造的开源大数据平台不仅服务阿里巴巴内部实时数据业务，同时也在阿里云上通过 E-MapReduce 和实时计算 Flink 版等产品为广大中小企业提供完美兼容开源生态体验的云原生大数据计算服务。

开源的未来：挑战与机遇

Craig Russell

Apache Software Foundation Apache 软件基金会董事

Rich Bowen

AWS Apache 软件基金会董事，AWS 开源战略师

Rich Bowen has been involved in open source since before we started calling it that. He's a member of the Apache Software Foundation, where he currently serves as a board member and VP Conferences. Rich is an Open Source Strategist at AWS.

Justin Mclean

Datastrato Apache 软件基金会董事

Justin Mclean is a highly experienced professional with over 30 years in web application development, education, and community work, and is an active contributor to open source software. Justin is a renowned speaker at conferences worldwide and currently serves as the Community Manager at Datastrato. He mentors projects in the Apache Software Foundation and holds positions as VP of the ASF Incubator, and is an ASF board member.

姜宁

大会议题评审组成员 | Apache 软件基金会董事

字节跳动开源办公室首席布道师，前华为开源管理中心技术专家，Apache 软件基金2022,2023 年度董事，Apache软件基金会孵化器导师，前红帽软件首席软件工程师，Apache 本地北京社群（ALC Beijing）发起人，有十余年企业级开源中间件开发经验，有丰富的Java 开发和使用经验。

大花

Answer Answer 社区经理

SegmentFault 思否旗下 Answer 社区经理，具备多年软件产品海外市场经验。

Apache ECharts 的图表服务端渲染方案

Ovilia

Apache ECharts Apache ECharts PMC Chair

Ovilia has been working on Apache ECharts project since 2016 and is now PMC Chair of the project. She is also a passionate advocate for data visualization. Through her work, Ovilia strives to democratize access to information, ensuring that it is readily available and easily understandable for people from all walks of life.

日志存储分析的数仓化

肖康

SelectDB SelectDB 技术副总裁

2009 年至 2012 年在百度从事 Hadoop 平台开发与建设。 2012 开始先后在 360、奇安信负责大数据平台，研发基于 MPP 架构的安全大数据引擎，构建全球最大的网络安全大数据。 2022 年联合创立 SelectDB，致力于研发新一代云原生实时数仓。

Apache Arrow DataFusion: 向量化查询引擎揭秘

刘昆

eBay eBay大数据工程师，Apache Arrow Committer & PMC Member

毕业于清华大学软件学院；目前就职于eBay大数据开发团队，大数据工程师； Apache Arrow PMC，Apache IoTDB PMC，主要从事数据库、存储引擎、查询引擎等领域的工作。

Apache Impala 4.2 & 4.3 版本新特性一览

黄权隆

Cloudera Impala PMC Member & Committer，Cloudera 研发工程师

Cloudera研发工程师，主要工作为Impala内核开发。在开源社区是Apache Impala PMC member & Committer，Apache ORC Committer

茶歇

Apache Doris 在衔远科技的应用实践

王永臣

北京衔远科技北京衔远科技大数据开发工程师

衔远科技数据团队的负责人，热衷于开源社区贡献，不断探索新技术领域，当前正深入研究Doris与AIGC的融合，欲将两者的优势充分发挥。

字节跳动大数据 SQL 权限精细化管理实践

朱江

字节跳动火山引擎 LAS 高级研发工程师

火山引擎LAS高级研发工程师

基于 Apache Calcite 的多引擎指标管理最佳实践

谢佳君

字节跳动火山引擎 LAS 高级研发工程师，Calcite Committer

字节跳动高级研发工程师，曾参与2022年Apache Asia Con的演讲。热爱开源，经常参与社区工作，现在是Apache Calcite active committer和Linkedin Coral Contributor。

What's new in the recent and upcoming HBase releases

张铎

神策数据神策数据首席架构师，Apache HBase PMC Chair

清华大学计算机科学与技术系本硕，长期从事开源软件的开发与维护。2015 至今历任ApacheHBase 项目的 Committer、PMC 成员、主席。2020 年成为 Apache 软件基金会的 Member。2018 年，在 Apache 软件基金会全球近 7000 名 Committer 中，贡献数量排名第三。曾任小米开源委员会主席，负责小米整体开源工作的规划与推进。目前在神策数据担任首席架构师。

Deep dive into resource manageability in ozone storage

Sumit Agrawal

Cloudera Senior Staff Engineer, Cloudera

Myself working in cloudera, contributing to Apache Ozone distributed storage and also a committer. I have 16 years experience in IT industry and worked over various domain including data storage, cloud application and middleware.

Spark SQL Shuffle Join Improvement at eBay

王玉明

eBay eBay 软件工程师，Apache Spark PMC

eBay SQL on Hadoop 团队软件开发工程师，Apache Spark PMC Member and Committer，2022 SIGMOD Systems Award 获得者。从 Spark 1.5.0 开始参与 Apache Spark 的开发，并成为最活跃的代码贡献者之一。专注于SQL查询性能优化。

字节跳动千亿文件 HDFS 集群实践

熊睦

字节跳动基础架构工程师

字节跳动大数据存储底座工程师，主要负责大数据存储 HDFS 元数据服务演进和上层计算生态支持。

茶歇

Apache Kyuubi & Celeborn(Incubating): 助力 Spark 拥抱云原生

潘成

网易，Apache Kyuubi & Celeborn 社区网易数帆软件工程师，Apache Kyuubi PMC，Apache Celeborn PPMC

网易数帆软件工程师，Apache Kyuubi PMC 成员，Apache Celeborn (Incubating) PPMC 成员。主要从事企业级离线计算引擎开发、Apache Kyuubi 开源社区建设等工作。

Resilient Data: Exploring Replication and Recovery in Apache Ozone

Sadanand Shenoy

Cloudera Software Engineer II , Cloudera

Sadanand Shenoy is a committer in the Apache Ozone project and has keen interest in distributed systems . Sadanand is currently working at Cloudera and has been actively contributing to the Apache Ozone project for the past 3 years . He has pursued B.E in Information Science and Engineering from MSRIT Bangalore.

Linkis 在理想汽车的应用实践

郗世豪

理想汽车理想汽车高级大数据工程师

理想汽车高级大数据工程师，主持开发 Linkis 1.3.2 版本，Linkis Committer，入职公司5年，现在在公司主要负责 Linkis 和 Spark 的二次开发，致力于在公司内部落地和推广 Linkis 平台，通过和 Spark 等底层引擎的结合，努力探索更加高效、灵活的数据处理方案，最终提升用户效率。

OpenDAL 的开发者体验分享

丁皓（Xuanwo）

Databend Databend 研发工程师

learn, work and think in an open-source way.

Apache Kvrocks 社区演进

王源

百度百度资深工程师

百度云数据库部资深工程师，百度云 Redis 和磁盘 KV 数据库 PegaDB 内核负责人 Apache Kvrocks PMC Member & Redis Group Member

社区和贡献者如何找到彼此？

庄表伟

开源社开源社理事

庄表伟，开源社理事、执行长，天工开物开源基金会副秘书长。 1997年毕业至今，始终战斗在编程的“第一线”，一直致力于推广并服务开源，热爱社区，热衷参与各种社区的交流活动。曾任盛大创新院高级研究员、印客网技术总监、华为开源管理中心开源专家。

自研分布式数据库的开源之路

段少婷

OceanBase OceanBase社区经理

OceanBase社区负责人，致力于前沿科技和技术的推广和布道，开源社区建设发展工作，先后就职于Sun、Adobe、百度、阿里云等世界500强公司负责大数据、AI、云计算、操作系统等开源技术社区的发展建设、人才培养和生态建设等工作。

茶歇

AI时代与智能组织：从Apache得到的启示

张雅琪（Alphatu）

北京奇点汇科技有限公司 ChaosAI 创始人

ChaosAI 创始人，北京奇点汇科技有限公司创始人，阿法兔研究笔记创始人

从0到“20000+”用户，Apache DolphinScheduler社区如何实现双向奔赴？

曾辉

白鲸开源白鲸开源高级社区经理

Apache DolphinScheduler Committer，白鲸开源高级社区经理，负责 Apache DolphinScheduler and SeaTunnel 社区的全球化运营工作，0-1搭建开源项目出海及落地策略，主导项目的生态建设，提升“开源项目”在全球的影响力及社区内部的建设，致力于传播开源文化。

Challenges and Solutions on building Realtime Data warehousing with Apache Flink , Apache Hive and Apache Iceberg

Yan Liu 刘岩

Cloudera Cloudera Solution Eng

Apache Hive and Apache Flink Contributor, Cloudera Solution Engineering. Over 10 Years of Practical Experience in Big Data and my current focus is real-time data warehouse using Apache Flink, Apache Hive, and Apache Iceberg.

基于 Flink 构建实时数据湖的实践

王正

字节跳动火山引擎云原生计算研发工程师

于 2021 年加入字节跳动，就职于基础架构开放平台团队，主要负责 Serverless Flink等方向研发。

OpenEuler and Bigtop with Ambari : Empower Data Lake in the real world

Yuqi Gu

Arm Staff Software Engineer, Arm

Yuqi Gu is currently Chair and PMC member of Apache Bigtop. He is also the committer and PMC member of Apache Ambari. He works for Arm and is mainly focusing on performance optimization on Arm64.

Apache Linkis 在湖仓一体架构下的数据处理实践

王华磊

中国邮政储蓄银行中国邮政储蓄银行副主任工程师

多年银行大数据领域数据架构经验，开源爱好者，Linkis社区贡献者。

茶歇

数据湖 Iceberg 在小米的实践与优化

肖杰宝

小米小米软件研发工程师

小米软件研发工程师，目前主要负责小米内部数据湖Iceberg内核及表优化服务的研发工作。

字节跳动基于 Parquet 格式的降本增效实践

徐庆

字节跳动火山引擎LAS高级研发工程师

字节跳动火山引擎LAS高级研发工程师。多年从事于Hive Metastore, SparkSQL, Hudi等大数据相关组件的研发工作。

王恩策

字节跳动火山引擎 LAS 高级研发工程师

火山引擎 LAS 高级研发工程师，负责字节跳动大数据分布式计算引擎的设计与研发，帮助公司在海量数据中挖掘出高价值信息

生成式AI的分布式缓存：优化云上的LLM数据管道

傅正佳

Alluxio Alluxio 开源布道师

傅正佳，Alluxio 开源布道师。本科毕业于上海交通大学电子系，随后取得香港中文大学信息工程博士学位，毕业后加入新加坡高级数字科学中心（美国伊利诺伊大学在新加坡的研究所）从事科研工作，在计算机网络和分布式系统领域相关的顶级国际会议发表多篇论文。加入Alluxio前曾在新加坡科技公司Bigo Technology担任机器学习研发总监。

胡梦宇

知乎知乎大数据基础架构开发工程师

胡梦宇，知乎大数据基础架构开发工程师，主要负责知乎内部大数据组件的二次开发与运维，目前主要工作内容集中在 HDFS，Alluxio，Flink。

字节跳动 Spark 支持万卡模型推理实践

刘畅

字节跳动字节跳动基础架构工程师

于 2020 年加入字节跳动，就职于基础架构批式计算团队，主要负责 Spark 云原生方向工作，Spark On Kubernetes 等方向研发。

张永强

字节跳动字节跳动机器学习系统工程师

于 2022 年加入字节跳动，就职于 AML 机器学习系统团队，参与构建大规模机器学习平台

为什么我们需要面向异构计算的编译器体系

王臣汉

OpenBayes贝式计算 OpenBayes 贝式计算创始人兼 CEO

曾就职于华特迪⼠尼互动媒体集团、AVOS Systems 等世界知名企业，中⽂语义理解基准 CLUE 基⾦会秘书⻓、天津⼤学⻉式计算联合研究中心副主任。

Flink ML 2.2.0 的新特性解析与应用

洪帆

阿里巴巴阿里巴巴算法专家

洪帆博士毕业于北京大学。毕业后加入阿里巴巴机器学习团队，主要从事 Flink 相关的机器学习开发与改进，是 Flink ML Contributor。此前还参与了前一代 Flink 机器学习库 Alink 的研发。

张智鹏

阿里云智能阿里云智能算法专家

阿里巴巴算法专家，Apache Flink committer 张智鹏博士毕业于北京大学，研究方向为分布式机器学习。毕业后加入阿里云机器学习团队PAI，主要从事Flink ML的设计，开发和改进。

茶歇

Bringing LLM to Everywhere via Machine Learning Compilation

Siyuan Feng

Shanghai Jiao Tong University Ph.D. Student, Shanghai Jiao Tong University

I'm a Ph.D. student in Zhiyuan Honors Program at Shanghai Jiao Tong University. Also, I'm a PMC member of Apache TVM, working closely with the community and developing new features, including TensorIR, Meta-Schedule, Auto-Tensorization, and Relax (next Relay). Recently, I am spending my time on MLC-LLM to deploy a large language model on every device.

字节跳动深度学习批流一体训练实践

毛洪玥

字节跳动字节跳动基础架构工程师

于2022年加入字节跳动，主要负责大规模云原生批流一体AI模型训练引擎，支撑了包括抖音视频推荐、头条推荐、穿山甲广告、千川图文广告等业务。

Unifying Real-time and Batch ML Inference using BentoML and Apache Spark

Bo Jiang

BentoML Product Engineer, BentoML

Product Engineer at BentoML, previously Product Engineer at Douban. Working on platforms industrializing AI Applications.

Boost ML networks on specific HW platform with Apache TVM on the example of Qualcomm Adreno™ GPU

Egor Churaev

Deelvin Solutions Sr. Software Engineer, Deelvin Solutions

Sr. Software Engineer with 10 years of experience, worked on several projects at Intel, in particular: OpenCL CPU compiler and Intel OpenVINO, PhD student at HSE University with topic related to emotion recognition with deep learning algorithms. Currently work on Apache TVM project. Commiter of Apache TVM.

Apache Flink 流批自适应 Shuffle

宋辛童

阿里云阿里云高级技术专家，阿里云 Flink Shuffle & SDK 团队负责人

Apache Flink PMC Member & Committer，阿里云高级技术专家，阿里云 Flink Shuffle & SDK 团队负责人。

谭玉新

阿里云阿里云高级开发工程师

就职于阿里云计算平台开源大数据部门，专注于 Apache Flink 开源项目。

基于Apache Calcite/Gremlin构建流式图处理系统

潘臻轩

蚂蚁集团蚂蚁集团资深技术专家

潘臻轩(泰初)，蚂蚁金服资深技术专家。2012年加入阿里集团数据平台，2016年加入蚂蚁集团数据技术部，经历了阿里和蚂蚁实时计算从0到1的演进，从17年底开始负责流式图系统和团队的构建，从0到1打造了蚂蚁的流式图系统。对实时计算和图计算以及上层的应用场景有深入的理解。

联通基于Apache StreamPark的大规模实时计算生产实践

穆纯进

联通数字科技有限公司联通数字科技有限公司大数据实时计算平台研发负责人

Apache StreamPark PMC、大数据实时计算平台研发负责人，负责万亿级Flink实时计算开发、运维以及平台建设

FlinkSQL的字段血缘及数据权限解决方案

白松

杭州数澜科技有限公司杭州数澜科技有限公司联合创始人，研发中心副总经理

数澜科技公司联合创始人、研发中心副总经理，拥有9年大数据平台研发经验，专注于大数据、实时计算、数据权限等领域的研究。负责公司核心产品数栖平台和数栖EMR的产品研发工作，目前数栖产品已成为国内外数百家公司建设数据中台的基础设施工具，例如中信集团、富士康、万科、宝马、浙江交投集团等。

茶歇

Streaming Apache Kudu within Apache Flink

Wei Chen

eBay Staff Software Engineer of eBay

Wei is focusing on empowering the eBay's Notification Platform by leveraging the big data and streaming processing technologies. He is also a tech blog writer and actively contributing in open source community. Wei received his bachelor and master degrees from Shanghai Jiao Tong University.

Shaping the Future: Unveiling High-Concurrency Streaming Analytics with Apache Druid

Tijo Thomas

Imply Data inc Lead Solutions Architect

TijoThomas +91 8971965432 tijothomas21@gmail.com Summary Lead with great passion for big data technology, having 18+ years of experience in the software industry ( engineering, professional service , product management). Helping customer in the field , negotiating with customer on the feature request and align them with the product roadmap Extensive experience across the stack in Managing, Architecting, Designing and Implementing Big data applications, frameworks and platforms. More than 4 year of experience as Solution Architect Experience in design and implementing a highly scalable SAAS platform for public Cloud. Hold two patents in the area of Big Data. Area of Expertise Expert level knowledge in Apache Big Data Platforms: Hive, Druid, NiFi , Kafka and Spark , Druid Internals. Expert level knowledge of Core Java and developing Java applications/platforms. Expert level knowledge in developing web applications using J2EE Intermediate level knowledge in developing microservice based applications Intermediate level knowledge of Scala functional programming. Intermediate level knowledge in applying Data mining and Machine learning algorithms. Skills Languages: Java/J2EE, Scala, Shell scripts. Apache Big Data technologies: Apache Druid , Hadoop (HDFS, Yarn, Hive), Spark (Spark SQL, Spark Core), Streaming (Spark Streaming, NiFi), Apache Atlas , Apache Ranger, Hbase, Phoenix, Oozie, Druid Methodologies: Agile, TDD , FDD , Extreme programming Java Technologies: Core Java Libraries, Java Web Services , Spring, JUnit, Ant, Maven Performance Engineering: JVM Garbage Collection Tuning, In-Memory, High Performance Data Structures and Algorithms, Parallel Processing, Multi-Threading, Distributed Systems, NIO. Work Experience Onboarding APAC Imply customers and their use case to Imply. Mentoring and Growing the team in the APAC region. Showcase new features to customers and onboard customers to the latest features. Conducting health checks and cluster reviews and providing recommendations. Handling escalated support cases . Developing automation tools related to various druid use cases. Review cluster utilization, Cluster sizing and analyze cluster metrics and provide improvement suggestions. Performance benchmarking & hardening. Technologies: Apache Druid, Pivot , Imply Manager , Imply Clarity Leading the architecture & design of Big Data implementation for one of the leading insurance companies in US/Canada. Architect for building onboarding tools for the source systems Understanding the application team requirement and conducting POC to onboard their application to Hadoop and related platforms covered under HDP distribution. Conducting POC, UAT with the platform team and official signoff from the App team System integration design and development. Cluster tuning for Hive, Spark workloads. Review cluster, analyze cluster metrics, and provide improvement suggestions. Technologies: Hive, Atlas , Ranger , Spark, Nifi, Hadoop(Yarn/HDFS), Kerberos, Airflow Leading the architecture & design of Huawei's Big Data Tools . Proposed an efficient way to test huawei hadoop releases , compare and contrast various performance and security metrics. Carried out Proof of Concept to demonstrate capability of Tools to Chief Technical officer for Big Data and Database. Ingest logs from Hadoop nodes and identify the characteristics of hadoop workload and optimization. Simulate the cluster behaviour as in customer clusters and provide provocative optimizations of hadoop cluster configurations to improve the workload efficiency. Used Dr.Elephant and smartsense to provide suggestions based on heuristics. Technologies: Spark, Hadoop , Elastic Search , Nifi , Logstash , Kibana ,BIRT. Leading the architecture & design of Huawei's Data pipeline . We have built the next-gen high performance Datapipeline engine using Apache Nifi that can integrate and orchestrate various bigdata component. - I proposed the idea of developing the next generation data pipeline for big data on Huawei Cloud. - Executed a Research Project to evaluate the technical feasibility of the proposal. - Designed an advanced on premises and distributed architecture to deploy in cloud - Designed and implemented key algorithms and data structures for achieving high performance data movement. - Travelled frequently to China R&D centers to propose and discuss key solutions. Technologies: Spark, Scala, Core Java, Maven, SBT. Leading a team of 10 members for Huawei’s Business Intelligence Platform in India. Technologies: Core Java, Swings, SWT, JavaScript, Dojo, JQuery, JSP, Servlets, Hibernate, Ant, Maven. - Lead the development of a large scale business intelligence platform with ETL, Analytics, Reporting and Dashboard Capabilities. - Implemented important business logic in key components. - Interacted with other Huawei product teams and collected their requirements and scenarios. - Designed new features and modules to handle customer requirements and scenarios. - Analyzed and finalized key technologies and open sources for key components. - Involved in-hands on coding of key modules. - Responsible for handling the deployment and support team in UAE, South Africa, Kenya, Canada, Isle of Man and Switzerland. - Lead the design and implementation of complete platform components using Java, Spring DM and OSGI with a sub-team size of 6 members. Designed, coded and implemented the key framework for the interest module. - Complete ownership of the design of all the sub modules. Assigned, reviewed and guided other members in design activities. - Also involved in-hands on coding, implementation and code review of the module. Technologies: Core Java, , Servlets, Spring DM, OSGI, iBatis, Maven. - Interacted with marketing and bidding teams to collect market requirements and design solutions for the same. - Worked on Access Control framework implementation for BT Transact Business Intelligence Report. - This involves the Framework API development for BI tools for Oracle Discoverer and Oracle 11i Application Server Technologies: Core Java, JSP. Joined as a fresher in the BT Transact and quickly took over E2E ownership of various modules. Re-designed the Test automation framework for order provisioning Gradually took ownership of the order provisioning engine module. Awards & Recognition Potential High Value Patent Award -2016 Unique Value Contribution Award - 2016 Patent Contribution Award – 2015 Open source Contribution Award -2015 Best Fighting Spirit Award - 2014 Quality Control Circle Contribution - 2013 Quality Control Circle Contribution - 2012 Quality Control Circle Contribution - 2011 Patents System and Method for Federated Access Control on Big Data – Filed in 2015 Method for distributing workload across cloud. - Filed in 2016 Education - B-Tech in Computer Science - 2001 – Kannur University, Kerala - Post Graduate Diploma in Information Technology -2006 – IIT Bombay

Apache Pulsar 3.0：首个LTS版本及其新特性

Zike Yang

StreamNative Software Engineer of StreamNative

Zike Yang is a Software Engineer at StreamNative, Apache Pulsar Committer, Apache StreamPipes Commiter & PMC Member. His current work primarily focuses on developing Pulsar multi-language clients and improving Pulsar core features.

Apache Pulsar 限流功能在移动云云原生场景下的应用实践

王嘉凌

中国移动云能力中心中国移动云能力中心软件开发工程师

中国移动云能力中心软件工程师，Apache Pulsar Contributor，负责移动云消息队列Pulsar和Kafka的研发

华为终端云在容器场景中对Apache Pulsar的优化实践

林琳

华为华为 SDE 专家

华为终端 SDE 专家，Apache Pulsar PMC 成员，拥有近10年中间件与基础架构设计经验，致力于打造稳定可靠的基础设施

茶歇

Kafka without Zookeeper

Luke Chen

RedHat Senior Software Engineer, RedHat

I'm a senior software engineer in RedHat working on products to run Apache Kafka on cloud. I'm also a committer and PMC member in Apache Kafka. I've been contributed in Apache Kafka for more than 3 years.

邓子明

字节跳动字节跳动数据开发

国防科技大学，多年数据开发经验，Apache Kafka Committer

Deep Dive the replication protocol in Kafka

Luke Chen

RedHat Senior Software Engineer, RedHat

The Apache Way: Building Community and Longevity

Justin Mclean

Datastrato Apache 软件基金会董事

开源贡献难吗？

李本超

字节跳动字节跳动 Flink SQL 技术负责人

Apache Calcite PMC Member Apache Flink Committer 毕业于北京大学，目前就职于字节跳动流式计算团队， Flink SQL 技术负责人。

社区的力量

陈阳

微软开源社理事长

开源社理事长，微软云计算与人工智能首席产品经理 GNOME基金会2010董事总监 GNOME.Asia社区创始人 Mozilla核心贡献者，GSoC导师，2016年任中国开源软件推进联盟副秘书长，开放原子开源基金会导师 2014年发起成立开源社，也是中国开源年度报告，COSCon的发起者。 16+ IT & OSS 行业经验，曾供职SUN，Oracle，微软跨国IT公司核心研发技术领域: 开源技术和社区治理，公有云，行业AI解决方案，知识图谱，智能对话，数据集成 2019年获得美国人工智能专利一项， O'REILLY 《Beautiful Testing》联合撰稿人

数字金融时代的云原生技术应用与创新

陈广胜

微众银行微众银行技术专家，Apache 软件基金会成员，Apache EventMesh PMC Chair，ALC Shenzhen 联合发起人

微众银行技术专家，Apache软件基金会成员，Apache EventMesh PMC Chair，ALC Shenzhen联合发起人

社群长青：开源社群如何可持续发展

tison

大会议题评审组成员 | 格睿科技（Greptime）开发者关系总监

Apache Member and Incubator Mentor

面向数字基础设施的通信新基座——移动云消息队列矩阵

胡宗棠

中国移动云能力中心中国移动云能力中心消息中间件领域技术专家，移动云消息队列团队负责人

中国移动云能力中心，消息中间件领域技术专家，移动云消息队列团队负责人。主讲人介绍：8年以上消息中间件内核开发和架构设计经历，从无到有参与移动云RocketMQ、MQTT、Kafka等多款主流消息中间件系列产品的内核架构和研发。作为技术嘉宾，曾多次参与Apache Conf Aisa2022/2023、Apache RocketMQ Summit/Meetup、云原生服务大会技术分享，开源项目实践经验丰富，担任Apache RocketMQ、SOFAJRaft、Nacos、openEuler message-middleware sig和openMessaging等开源社区的Maintainer/Committer。曾获2022年信通院《云原生技术标准专家》,多个开源社区的开源先锋等荣誉。

基于EventMesh构建超大规模云原生事件网格-EventGrid

薛炜明

微众银行微众银行中间件研发工程师

薛炜明，微众银行中间件平台开发工程师，Apache EventMesh项目PMC，专注于云原生、事件消息、微服务领域，热爱技术开源。曾多次参与相关大会分享，如Apache Con Asia 2021\2022、COSCon'22等。

王海军

华为云计算有限公司华为云中间件团队技术专家

目前就职于华为云计算有限公司，华为云中间件团队技术专家，事件网格（EventGrid）服务技术负责人

Apache Royale Externs

Alina Kazi

The Apache Software Foundation Apache Royale Committer, The Apache Software Foundation

Contact alinakazi@apache.org alinakazi1990@gmail.com +92-333-7005671 www.linkedin.com/in/alee na- kazi-47428b100 (LinkedIn) Team Member Apache Royale Organization https://royale.apache.org/team/ Top Skills Apache Royale Flex Java C++ C# ASP.NET .Net Core MS SQL and Oracle GitHub Source tree Team Foundation Jira Confluence GWT Apache Royale www.royale.apache.org Committer / Apache Team Member July 2020 – Present Globally Providing training and consultancy services on Apache Royale Migration Projects as well as a new development in Royale. Working on multiple Flex to Royale migration projects with my team. Being the Committer at Apache Software Foundation (Montreal, Canada) from April 2018 – Present. I contribute to releases of Apache Royale. You can see more about Apache Royale at https://royale.apache.org/team Also providing paid commercial Support: https://royale.apache.org/royale-commercial-support/ Achievements: 1. First Asian / Pakistani / Female Apache Royale SDK Committer 2. Presented at Apache Conference (APACHECON2020). Can be viewed at Apache official YouTube Channel: https://www.youtube.com/watch?v=O5WJ6nRPX40 Islamabad, Pakistan Feb,2023 - Present Designed and Developed and Physiotherapy Clinic Software System includes biometric attendance, salary process, patient session and management of all other clinic activities. Tools Visual Studio Code Moonshine Eclipse Flash Builder Visual Studio 2010,2012,2019 SQL Developer Fiddler Bit Bucket GitHub Source tree Team Foundation Jira Self Employed Freelance Apache Royale Developer March, 2021 – Present MX-Spark and Jewel UI/UX Developer in Apache Royale. Additional features added to organization’s existing application & migrated existing Apps. Using external JavaScript libraries. Tree, Mobiscroll, Calendar and other libraries under one umbrella Apache Royale Roles: Task document preparation, Project configuration in Visual Studio Code, Maven configuration, Ant configuration, Royale SDK compilation, SDK deployment/distribution, Environment setup. I have several clients Micronix, Veriskope and Altron and many other companies worldwide. www.codeoscopic.com Madrid, Spain Software Developer / Apache Royale Developer July 2020 – March,2021 Jewel UI/UX Developer in Apache Royale. Roles: Task document preparation, Project configuration in Visual Studio Code, Maven configuration, Ant configuration, Royale SDK compilation, SDK deployment/distribution, Environment setup and software development of application. www.cavalier-group.com Cavalier Group of Companies Backend Developer (Team Lead) April 2020 - June 2020 Islamabad, Pakistan Dot Net Development, API’s Development, and DB procedure call. SRS Document Preparation DBiz Solutions July- 2017 – March-2020 Islamabad, Pakistan • Senior Consultant • Assistant Manager JAVA Development in eclipse, development of complex processes (feasibility), documents, setups and reports. Lead the Porting Project of moving Apache Flex Application (having more than 1000 .mxml and .as files) to Apache Royale Using ANT Changes in Flex code for migrating it to Royale and Apache Royale SDK (creation of Emulation Components in SDK to compile and run migrated Application) SIDAT HYDER Morshed Associates Team Lead (JAVA & FLEX DEVELOPMENT) July 2016 - June 2017 Islamabad, Pakistan Senior Software Developer (JAVA & FLEX DEVELOPMENT) July 2015 - June 2016 Islamabad, Pakistan Flex and Java Developer Pakistan Air Force Air HQ, Islamabad Software Developer (Project Vision – JF-17 Thunder) July 2013 - July 2014 (1 year 1 month) E-9, Islamabad Developed a Platform independent, multi-threaded socket (TCP/UDP) Library using C/C++. Worked on Boost Library, Real-time Applications Testing Developed Real-time Applications Education Quaid-e-Azam University, Islamabad Master's degree, Information Technology · (2011 - 2013) Percentage: 79.6%

统一 AD、Linux 和 Apache Openmeetings 的基于角色的访问控制机制

刘文涛

阿帕奇软件基金会高级软件工程师

主讲人刘文涛先生在中国拥有超过10年的IT和教学经验。 8年前，当他担任梅溪湖城市可持续发展项目的助理总经理时，他创立了一家涉足云计算和移动应用开发领域的初创公司。他热衷于开源项目，例如 Apache Openmeetings、Apache Directory Server。他是微软初创公司资助者中心成员。平时，他在各个方面为开源社区做出了贡献，例如作为会议发言人分享他学习和采用开源技术的经验、翻译、口译等。

What's inside a Smartphone? Exploring the internals with Apache NuttX Real-Time Operating System

Lup Yuen Lee

Apache NuttX PMC IoT Techie and Educator, Apache NuttX PMC

Please see https://lupyuen.github.io/

茶歇

如何基于云原生技术帮助 Apache Kafka 实现弹性能力

韩旭

安托盟丘（AutoMQ）安托盟丘（AutoMQ）存储技术负责人

目前就职于安托盟丘（AutoMQ），主导 Serverless Kafka 的设计和开发。曾经是蚂蚁金服在线消息和离线数据传输链路负责人，在大规模消息集群的存储和流量高可用方面有丰富的经验。

Apache HugeGraph 图数据库在货拉拉的应用与调优

杨嘉奇

货拉拉货拉拉大数据工程师

杨嘉奇货拉拉/业务风控部/大数据专家/Apache Hugegraph Commiter。现主要负责图数据库相关工作，为部门提供高效，稳定，易用，准确的图数据服务。从事大数据相关工作8年。

Apache Dubbo 静态化 GraalVM Native Image 解决方案与实践

华钟明

杭州有赞科技有限公司杭州有赞科技有限公司中间件技术专家

中间件技术专家, Apache Dubbo PMC，见证了Dubbo从Apache孵化到毕业的过程。Dapr Member、Apache Tomcat/Maven Contributor，GSoC/OSPP/GLCC Mentor，热爱分享，热衷开源。

Apache Dubbo 云原生可观测性的探索与实践

宋小生

平安壹钱包平安壹钱包中间件资深工程师

Apache Dubbo Committer 负责中间件相关研发工作阿里云藏经阁《Apache Dubbo3 源码深入解读》作者

基于 Triple 协议实现Web、移动端、后端服务全面打通

陈有为

apache 陌陌研发工程师

Apache Dubbo PMC, 陌陌研发工程师

精进云原生 - Dubbo Kubernetes 最佳实践

江河清

阿里巴巴（中国）有限公司阿里云研发工程师

Apache Dubbo PMC，阿里云研发工程师，专注于服务框架，Apache Dubbo 核心维护团队成员。

茶歇

政采云基于dubbo的混合云跨网方案实践

王晓彬

政采云政采云资深开发工程师

政采云资深开发工程师，前网易高级开发，负责公司基础服务及创新技术研究。开源领域，分布式事务、分布式数据库、rpc、分布式运行时等社区均有贡献。

OpenSergo & Dubbo 微服务治理最佳实践

何家欢（屿山）

阿里云阿里云 MSE 研发工程师

Sentinel Maintainer, 阿里云 MSE 研发工程师，熟悉微服务治理与稳定性领域。

工商银行分布式建设及转型实践

丁兴中

中国工商银行软件研发中心云计算实验室中国工商银行软件研发中心云计算实验室分布式微服务框架架构师

工商银行分布式微服务框架架构师

Seata：微服务架构下的一站式分布式事务解决方案

季敏（清铭）

阿里云阿里云分布式事务产品负责人，Seata 开源项目创始人，微服务开源治理负责人

季敏(清铭)，阿里云分布式事务产品负责人，Seata 开源项目创始人，微服务开源治理负责人。有超过10年研发架构经验，推动了集团内中间件的大规模落地、微服务PaaS商业化和开源。目前主要关注云计算中间件领域，致力于打造下一代微服务架构。

数据安全：Apache Ozone 如何保证数据的存储和访问安全

陈怡

Cloudera Cloudera 首席存储工程师

陈怡，Apache Ozone 开源社区PMC 主席，长期专注于分布式存储领域。目前就职于Cloudera，担任首席存储工程师。曾就职于腾讯和Intel，担任大数据存储技术负责人。

字节跳动 MapReduce -> Spark 平滑迁移实践

魏中佳

字节跳动字节跳动基础架构工程师

2018 年加入字节跳动，现任字节跳动基础架构大数据开发工程师，专注大数据分布式计算领域，主要负责 Spark 内核开发、字节自研 Shuffle Service 开发。

Apache Kudu 在神策的应用和实践

汪细勖

神策网络科技（北京）有限公司神策网络科技（北京）有限公司分布式软件开发工程师

2017年毕业于北京航空航天大学，长年致力于互联网大数据的基础架构建设，主要从事分布式存储计算系统的开发及应用工作。热爱开源，积极参与开源社区的工作，先后参与Apache Doris, Apache Pegasus和Apache Kudu的开源项目，并且是Apache Doris committer。目前供职于神策网络科技有限公司基础研发部存储组。

小米 HDFS 数据治理实践与演进

王成伟

小米小米高级软件研发工程师

小米高级软件开发工程师，HDFS Contributor，多年的 HDFS 优化与维护经验。在小米主要负责 HDFS 相关的优化与维护工作。

茶歇

Apache Celeborn(Incubating): 让 Spark 和 Flink 更快更稳更弹性

周克勇

阿里云阿里云 EMR Spark引擎负责人

阿里云EMR Spark引擎负责人，Apache Celeborn (Incubating)的初始作者，在Remote Shuffle Service，向量化引擎，优化器等方面有一定经验。

基于Apache Linkis快速高效构建数据应用工具

王和平

微众银行微众银行高级工程师

Apache Linkis PMC 现在就职于微众银行，主要负责Linkis、Spark、Trino、DataSphereStudio等项目的开发和运营工作

How increasing partition size in Apache Cassandra can reduce disk usage by over 30%

John Del Castillo

NetApp Technology Evangelist, NetApp

I’m a software engineer with over 15 years of experience developing enterprise software solutions across a variety of languages and technologies. For 6 years I worked at Instaclustr as a Lead Engineer and for the last year I've taken the mantle of Technology Evangelist, specializing in open-source technology. In this role I explore the landscape of open-source technologies, explore new solutions, document interesting use cases and create written and video content to help educate and encourage people to use open source for their business.

Apache ShenYu社区建设的道与术

肖宇

京东科技京东科技架构师

ASF Memeber Apache ShenYu VP/ PMC Chair

开发者内容体验解密之旅，追风踏浪

Yu Liu

StreamNative Technical Writer, StreamNative

Yu Liu 是 Apache Pulsar PMC 成员和 Apache Trafodion committer。作为 StreamNative 的 Technical Writer，她一直致力于提升用户的内容体验，积极布道开源社区，参与开源和技术传播行业各类活动，曾在 ApacheCon Asia、COSCon、OpenInfra Days China、tcworld China、中国技术传播论坛等大会上发表关于信息架构、内容运营与策略、内容代码化等主题的演讲，并踊跃投身于 Our Code is Open 和 OSPP 等开源活动。

面对开源，你还只有用“爱”发电吗？

Rick

开源面对面开源面对面布道师

程序员，业余开源布道者，开源面对面主持人。

开源之夏——致力点亮新生代开发者的星星之火

李梦

中科院软件所中科院软件所开源之夏品牌负责人

李梦，开源之夏品牌负责人，一枚开源路上学习和进步者，负责开源之夏活动运营第四年。

茶歇

开源之夏最佳实践

王嘉树

北京邮电大学北京邮电大学学生

开源之夏2022结项学生，开源之夏首位校园大使，开源之夏北邮校园行组织者之一

Apache Doris 毕业一周年：全球最活跃开源数据库项目的共建之道

鲁志敬

Apache Doris 社区 Apache Doris 社区 Committer

鲁志敬，Apache Doris Committer，原百度高级产品经理，曾担任百度 Doris 团队产品和运营负责人，负责内核产品规划、项目商业化 toB 以及开源社区运营等工作。过去几年一直致力于推广 Apache Doris，社区首个非代码 Committer。

《开源 PolarDB-打造世界级云原生数据库开源社区》

周正中

阿里云计算有限公司阿里云数据库高级产品专家，阿里云PolarDB开源社区运营负责人

花名德歌，阿里云数据库高级产品专家，阿里云数据库PolarDB开源运营负责人。中国开源软件推进联盟PostgreSQL分会特聘资深领域专家；PG中国社区发起人及PostgreSQL象牙塔发起人之一；DBA社群联合发起人之一；获得10余项数据库相关专利。

云原生数据湖如何提速两倍以上

史少锋

Kyligence Inc 首席架构师，Kyligence Inc

史少锋，Kyligence 首席架构师；Apache Kylin committer & PMC。曾就职于 eBay，IBM，作为核心成员，参与 Apache Kylin 项目的研发与开源全过程。对大数据和云计算行业发展趋势、产业生态、技术产品化有着长期深入的研究和丰富的实战经验。

Apache Paimon 流式数据湖：CDC 入湖与流读

李劲松

阿里巴巴阿里巴巴高级技术专家，阿里云开源大数据表存储团队负责人

阿里云开源大数据表存储团队负责人，负责 Apache Paimon 的研发和产品，Founder of Apache Paimon，PMC member of Apache Flink，Committer of Apache Iceberg&Beam。先后从事分布式流计算、分布式批计算、湖存储，目前专注于流式湖仓一体的技术。

下一代超高性能大数据集成工具 - Apache SeaTunnel 在数据湖场景的应用

代立冬

白鲸开源白鲸开源联合创始人

白鲸开源联合创始人、 Apache SeaTunnel PMC & Apache DolphinScheduler PMC、Apache 孵化器导师

基于 Apache Iceberg, Apache Arrow 和 Apache Parquet 的创新 lakehouse 设计

吴刚

云器科技云器科技软件工程师

他目前是 Apache ORC 的 PMC，也是 Apache Arrow 和 Apache Parquet 的 committer。在此之前，他是阿里巴巴的高级技术专家，负责MaxCompute的存储系统，也曾在Uber负责Apache Spark平台。

付旭炜

云器科技云器科技软件工程师

云器科技软件工程师, 主要负责云器 lakehouse 存储系统的研发。

茶歇

工业互联网背景下的应用与时序数据管理解决方案

许浩

上海道客网络科技有限公司上海道客网络科技有限公司技术顾问

近6年云原生技术领域从业经验，对金融、制造等多个行业数字化转型有深刻理解，致力于通过云原生相关技术和理念，帮助各个产业加速数字化进程。

基于 IoTDB Pipe 玩转工业物联网端边云数据同步

苏宇荣

天谋科技天谋科技内核研发工程师

面向IoT的消息队列核心设计

潘东元

阿里云阿里云消息研发工程师

潘东元，阿里云消息研发工程师。本硕毕业于东南大学，毕业后加入阿里云消息团队，参与了阿里云消息队列RabbitMQ、MQTT、Kafka的研发和商业化工作。目前主要聚焦于MQTT物联网消息领域。

释放物联网潜力：EMQX 与 Apache IoTDB 的结合

周子博

杭州映云科技有限公司杭州映云科技有限公司 EMQ 社区布道师

EMQ 社区布道师，有着 7 年物联网解决方案研发设计经验，目前致力于在社区中分享传播 EMQ 在物联网领域的经验积累和最佳实践。

茶歇

智能厂务耗量分析

李博

中芯国际中芯国际助理专家

从事Web后端开发与大数据开发工作。

RocketMQ-MQTT 在小米 IoT 场景的落地与实践

房成进

小米小米消息中间件研发工程师

2020 年加入小米，消息中间件研发工程师，聚焦小米自研消息队列和 RocketMQ-MQTT 消息网关的架构设计与研发工作

阿里云基于 Flink CDC 的实时数据集成实践

阮航

阿里云阿里云高级研发工程师

阿里云高级研发工程师, Flink CDC Maintainer & Apache Flink Contributor

自如基于 Apache StreamPark 的大规模 On Kubernetes 实时计算生产实践深度解析

陈卓宇

自如自如大数据平台研发工程师

apache streampark ppmc

Flink K8S Operator AutoScaling

陈政羽

真有趣游戏 Apache Flink中文社区志愿者（大数据开发工程师）

Apache Flink/Streampark Contributor，Apache Flink 中文社区志愿者，长期从事游戏行业数据开发，目前在游戏公司负责Flink公有多云数据解决方案，数据同步以及大数据作业管道平台从0到1的设计、建设工作

RSQLDB 基于消息队列的流数据库

倪泽

阿里云阿里云消息团队研发计算专家

阿里云，倪泽：Apache RocketMQ Committer，RocketMQ Streams maintainer，RSQLDB maintainer，云原生-消息团队研发计算专家

茶歇

State of Scala API in Apache Flink

alexey

Ververica Solution Architect of Ververica

I am a Solution Architect working for last the last 6 years on data solutions and products. At Ververica I am focusing on supporting clients to solve their challenges in adopting data stream processing with Apache Flink. Among my previous project and companies I developed different systems such as Data Lakes, Data Integration and Data Virtualization Layers. I have also spent many years on developing data services for investment banks including currency trading software. In my spare time, I also contribute to various open-source projects or start my own for fun. My hobbies are astronomy, playing music and gym.

小米 Flink 实时计算平台的建设实践

陈子豪

小米小米软件研发工程师

小米软件研发工程师，主要负责小米实时计算平台及 Flink 框架内核开发。

Developing Fast Applications With Open Source Software - Without The Fury

Paul Brebner

NetApp Open Source Technology Evangelist, NetApp

Paul is the Open Source Technology Evangelist at Instaclustr (now part of Spot by NetApp). For the past six years, he has been learning new scalable Big Data technologies, building realistic demonstration applications, and blogging and talking about a growing list of open-source technologies including Apache Cassandra, Apache Spark, Apache Kafka, Apache ZooKeeper, Redis, OpenSearch, PostgreSQL, Cadence, and many more. Since learning to program on a VAX 11/780, Paul has extensive R&D, teaching, and consulting experience in distributed systems, technology innovation, software architecture and engineering, performance engineering, grid and cloud computing, and data analytics and machine learning. Paul has also worked at Waikato University (New Zealand), University of New South Wales (UNSW, Sydney), Commonwealth Scientific and Industrial Research Organisation (CSIRO, Australia), University College London (UCL, UK), National ICT Australia (NICTA), Australian National University (ANU), and several tech start-ups (including as a Founder/CTO). Paul has an MSc (1st Class Hons, Waikato) in Machine Learning and a BSc (Computer Science and Philosophy, Waikato).

基于 Kubernetes 部署 Apache JMeter 进行大规模负载测试

殷翀元

EMQ EMQ 项目经理

殷翀元，EMQ XMeter 项目经理，负责性能测试平台 XMeter 产品，以及开源项目 mqtt-jmeter 的维护。曾于 ApacheCon Asia 2022 做题为《Apache JMeter 在 IoT 测试中的应用》的演讲。 EMQ 是一家开源物联网数据基础设施软件供应商，交付全球领先的开源云原生 MQTT 消息服务器和流处理数据库。EMQ 发起与运营的开源项目包括：开源物联网消息中间件项目 EMQ X，云原生分布式流处理数据库 HStreamDB，轻量级物联网边缘分析与流式处理开源软件 eKuiper，边缘工业协议网关软件 Neuron，跨平台 MQTT 客户端工具 MQTT X，MQTT JMeter 插件 mqtt-jmeter，等。

Apache Arrow Optimization on Arm

Yibo Cai

Arm Principal Software Engineer, Arm

Yibo has been working in the IT industry for more than 20 years. He is principal software engineer from Arm, focusing on improving software ecosystem on Arm server. Including big data, storage, and database. Yibo has rich experience in software optimization, from libraries to complex workloads, from low level architecture specific optimization to high level algorithm improvement. Yibo is an active open source contributor. He is Apache Arrow PMC and SPDK maintainer for SPDK-CSI project.

Java profiling Internals and Its Applications

杨龙

alibaba Tool Development Engineer, Alibaba

Member of Alibaba Dragowell team. OpenJDK 8u JFR contributor. Member of Alibaba Cloud Continuous Performance Profiling Team, Java Fliht Recorder and async-profiler expert. He has many years of experience in the Java performance field. In the past year, participated in the development of Alibaba Cloud continuous profiling products.

茶歇

OPPO大数据Spark、Flink引擎诊断系统的设计与实践

庄泽波

OPPO OPPO 高级后端研发工程师

OPPO高级研发工程师，Compass开源项目Maintainer，专注云原生、可观测、大数据等领域。

优化 HDFS 性能:在 KUNPENG920 上实现硬件加速器和 ARM64 多核能力的均衡利用

Guodong Xu

Linaro Ltd. Senior Tech Lead, Linaro Ltd.

2012-2023 Linaro Ltd. - Arm SVE/SVE2 optimization, senior tech lead - Kernel development tech lead 2008-2011 Flextronics Ltd, Software development lead 2002-2007 Motorola Mobile, Linux kernel device driver development

AI大模型来袭：开源治理和开发者关系的挑战和机遇

林旅强 Richard Lin

零一万物开源负责人

林旅强 (Richard Lin)，开源社联合创始人，前华为云AI开发者生态总监、华为开源专家，信通院云计算开源产业联盟 (OSCAR) 的专家，对开发者关系及开源生态有深厚的理解和丰富的实践经验。其在行业贡献涵盖了多个领域，包括将中国第一本专注于开发者关系的书籍《开发者关系：方法与实践》引进并亲自翻译，并于国内外诸多知名开发者大会担任演讲嘉宾，如O’Reilly OSCON、LinuxCon、CSDN开源技术峰会、COSCon中国开源年会及开放原子全球开源峰会等。林旅强在过去的15年里，在中国大陆及港台地区深度参与并推动了企业及社区在开发者关系及开源生态的发展，擅长运用以技术生态为基础的商业策略和企业战略，精通开发者生态运营、技术品牌影响力及开源治理。他曾从零到一搭建并完成开发者生态架构系统的落地部门运作，引领开发者依托平台生态进行应用构建。目前，他担任独立顾问及企业内训讲师，引导和帮助企业和开发者在开源生态中找到自身的发展路径。

Talking with management about open source

Rich Bowen

AWS Apache 软件基金会董事，AWS 开源战略师

借助开源生态的力量，探索自我成长与职业发展的价值

潘娟 Trista

SphereEx SphereEx 联合创始人 & CTO

SphereEx 联合创始人 & CTO，Apache Member & Incubator Mentor，Apache ShardingSphere PMC，AWS Data Hero，中国木兰开源社区导师，腾讯云 TVP。  曾负责京东数科数据库智能平台的设计与研发，现专注于分布式数据库 & 中间件生态及开源领域。被评为《2020 中国开源先锋人物》，2021 OSCAR 尖峰开源人物。CSDN 2021 年度 IT 领军人物，2022 年在 ICDE 发表论文 “Apache ShardingSphere：A Holistic and Pluggable Platform for Data Sharding”

闪电演讲

Navigating the ASF Incubator Process

Justin Mclean

Datastrato Apache 软件基金会董事

Apache HugeGraph在图数据库开源领域的思考与实践

金子威

百度百度资深研发工程师

Apache HugeGraph PPMC, 开源社区当前发版负责人, 专注于分布式存储(文件系统/KV数据库) & 图存储/图计算方向, 主导社区性能优化与技术演进, 热爱开源 & 基础架构相关技术, 欢迎大家多多交流 (GitHub: imbajin)

StreamPark ─ 从个人项目到 Apache 孵化器

王华杰

ASF Apache StreamPark 创始人

Apache StreamPark 创始人, 开源组织 Datavane 创始人，SelectDB 资深架构师

Apache OpenDAL(Incubating)：笨办法践行 Apache 之道

尚卓燃（PsiACE）

Databend Labs Databend 研发工程师实习生

Apache OpenDAL(Incubating) PPMC Member

茶歇

Tomcat 的技术内幕和在喜马拉雅的实践

彭荣新

喜马拉雅网络科技有限公司喜马拉雅资深架构师

任职于喜马拉雅, 资深架构师, 10年以上的工作经验，将近8年的基础架构经验，一直从事中间件相关的架构设计和研发工作，2017年加入喜马拉雅，一直在聚集在稳定性保障的领域，主导设计和研发了API网关，负责同城多活项目落地，数据访问层中间件等稳定性保障的相关配套工具的设计和落地，在中间件落地有丰富的经验，爱好学习，乐于分享和交流。

GraalVM 静态编译在 Web 容器应用中的使用实践

饶子昊

Alibaba cloud 阿里云智能研发工程师

阿里巴巴云计算, 阿里云智能研发工程师，Spring Cloud Alibaba 社区负责人，目前就职于阿里云原生应用平台团队，参与了MSE、EDAS、CSB、ARMS等多个云产品的开发，长期专注于微服务、分布式系统、云原生可观测性相关技术。

如何参与 Tomcat 社区

李晗

NetEase Youdao Information Technology (Beijing) Co., Ltd. 网易有道高级工程师

网易有道信息技术(北京)有限公司高级工程师，Apache Tomcat 提交者和 PMC Member

使用APM工具链快速定位Tomcat问题最佳实践

陈承

阿里云阿里云高级开发工程师

专注可观测领域多年。是阿里巴巴内部大规模分布式链路追踪系统EagleEye以及国内领先APM产品ARMS(阿里云应用实时监控服务)的核心开发人员，具有丰富的APM产品开发、使用经验以及问题排查经验。

茶歇

Dubbo-Go-Pixiu的前世今生

吕梦超

上海联蔚数字科技集团股份有限公司上海联蔚数字科技集团股份有限公司高级 JAVA 工程师

Apache Dubbo Committer，Java工程师，现居南京，毕业于洛阳理工学院。现致力于在Dubbo Go Pixiu社区构建Dubbo生态中通过Http直接调用dubbo集群以及dubbo mesh控制面的能力。

Securing Tomcat

Dennis Jacob

Payments Technology Organization Senior Consultant at a Payments Technology Organization

I am currently working as a Senior Consultant in Middleware Engineering with a leading payment technology organization, with overall 19 years of IT experience. Apart from Middleware Engineering, my interests include application security and cloud native technologies. I am passionate about experimenting on the latest advancements in technology, and speaking to communities and local groups.

Secure by default web applications with Apache Sling

Robert Munteanu

Adobe Senior Computer Scientist of Adobe

Working as a Senior Computer Scientist in the AEM Core Cloud Foundation team at Adobe, Robert Munteanu is a software developer with a passion for open source. He is a member of the Apache Software Foundation and frequent contributor to many open source projects, notably Apache Sling and Apache Jackrabbit. Robert is a frequent conference speaker, having spoken at Devoxx, Belgium ApacheCon and EclipseCon, amongst others.

新一代数据集成平台Apache SeaTunnel

高俊

白鲸开源白鲸开源架构师

10年大数据方向研发经验，开源爱好者 Apache DolphinScheduler PMC Apache SeaTunnel PMC

DataOps 在联通数科的实施构建数据研发运营一体化能力

王兴杰

联通数字科技有限公司联通数字科技有限公司数据平台架构师

主要负责数据中台工具的集成，推动dataops的实施与应用。目前是Apache dolphinscheduler的committer

如何使用 Apache SeaTunnel 快速接入新数据源

田超

Apache SeaTunnel PMC Apache SeaTunnel PMC 大数据开发工程师

Apache SeaTunnel PMC，开源爱好者，目前专注于数据集成领域

Apache DolphinScheduler与云对象存储的对接与整合

程鑫

阿里云阿里云研发工程师

任职于阿里云EMR数据开发团队，基础平台研发工程师，硕士毕业于清华大学，Apache DolphinScheduler Committer。

茶歇

Apache DolphinScheduler 指标体系分享

高楚枫

阿里云阿里云 EMR 数据开发团队工程师

Apache DolphinScheduler Contributor，阿里云EMR数据开发团队工程师

数据集成技术在小米的应用与实践

郑钧元

小米小米高级软件研发工程师

参与小米数据平台建设，有多年大数据开发经验，致力于分布式计算及数据集成技术领域建设。

Apache Impala 在神策数据仓库中的最新实践

张倩琼

神策网络科技(北京)有限公司大数据架构师

我是一位专注于分布式计算和存储系统研发的大数据架构师。我对分布式计算、数据存储和数据处理领域有深入的理解，并能够运用最佳实践来构建高效的大数据解决方案。在Hadoop、Impala、Flink、Kudu等Apache系统的优化方面，我积累了丰富的经验。我曾在腾讯基础架构部、百度大数据部工作，目前就职于Sensorsdata的基础研发部。

Federated Cross-platform SQL with Apache Wayang

Kaustubh Beedkar

Indian Institute of Technology Delhi Assistant Professor, Indian Institute of Technology Delhi

--Experience-- [April 2023 -- present] Assistant Professor, Indian Institute of Technology, Delhi [May 2023 -- present] Committer and PPMC Apache Wayang, The Apache Software Foundation [2022 -- present] Co-Founder, Databloom AI [June 2021 -- March 2023] Junior Fellow, The Berlin Institute for the Foundations of Learning and Data (BIFOLD) [June 2017 -- March 2023] Senior Researcher, Technical University of Berlin, Germany [Oct 2014 -- Dec 2016] Researcher, University of Mannheim, Germany [April 2012 -- Sept 2014] Researcher, Max-Planck-Institute for Informatics, Germany [Jul 2011 -- April 2012] Visiting Scholar, Max-Planck-Institute for Informatics, Germany --Education-- [2017] Ph.D. in Computer Science, University of Mannheim [2008] MS in Computer Science, Georgia Institute of Technology, USA [2007] B.Tech. in Information Technology, Amrita University, India

Apache Druid 开源十年后的 26.x 大版本

金嘉怡

Shopee Shopee 技术专家，Apache Druid Committer & PMC Member

Shopee 大数据技术专家，Apache 顶级项目 Druid 的 Committer 和 PMC，阿里云开源组织 Member，ApacheCon 技术峰会的讲师，极客时间的签约讲师，Imply 高级技术顾问，《宇宙湾》博客的博主，《Java Coding Problems》一书的译者（预计今年上市）

Kylin 5：现代化的大数据分析平台

俞霄翔

Kyligence Inc Kylin Committer & PMC , Kyligence 大数据研发工程师

Kylin Committer & PMC , Kyligence 大数据研发工程师。

茶歇

小米基于Apache Doris的OLAP实践

魏祚

小米小米数据库内核研发工程师，Apache Doris Committer & PMC Member

小米数据库内核研发工程师、Apache Doris PMC，在小米集团负责OLAP系统的研发、优化和运维工作。

中国移动基于 Apache HAWQ 的云原生分析型数据库

王小玉

中移动信息技术公司中移动信息技术公司数据库内核架构师，分析型数据库产品负责人

王小玉，中移动信息技术公司数据库内核架构师，分析型数据库产品负责人。负责公司分析型数据库产品的研发和应用。有十数年的数据库内核研发经验，主要研发领域包括查询优化器、高性能计算、SQL on Hadoop等。

SkyWalking的Golang自动探针实践

刘晗

Tetrate Engineer of Tetrate

Apache SkyWalking 项目的代码提交者和 PMC 成员。开源爱好者。目前就职于Tetrate，专注于做可观测性开发。

Resilient and secure applications with Apache APISIX and KEDA

Jintao Zhang

API7.ai Cloud Native expert, API7.ai

Apache APISIX PMC, Kubernetes Ingress-NGINX maintainer, Microsoft MVP.

BanyanDB:一个高扩展性的分布式追踪数据库

高洪涛

Tetrate Tetrate 创始工程师

美国servicemesh服务商tetrate创始工程师。原华为软件开发云技术专家，对云原生产品有丰富的设计，研发与实施经验。对分布式数据库，容器调度，微服务，ServicMesh等技术有深入的了解。目前为Apache ShardingSphere和Apache SkyWalking核心贡献者，参与该开源项目在软件开发云的商业化进程。前当当网系统架构师，开源达人，曾参与Elastic-Job等知名开源项目。对开源项目的管理，推广和社区运营有丰富的经验。积极参与技术分享，曾在多个技术大会中做过分享，包括DTCC，ArchSummit, Top100，Oracle嘉年华等。在多个媒体发表过文章，如InfoQ，OSChina等

点击流数据分析的云原生解决方案和实践

郑予彬

AWS 亚马逊云科技资深开发者布道师

郑予彬，软件工程硕士，20年ICT行业和数字化转型实践积累。现任亚马逊云科技资深开发者布道师，专注于AWS云原生、云安全技术领域。18年的架构师经验，专注为金融、教育、制造以及世界500强企业客户提供数据中心建设，软件定义数据中心等解决方案的咨询及技术落地。

刘勇

亚马逊云科技亚马逊云科技解决方案架构师

刘勇，亚马逊云科技解决方案架构师，毕业于北京邮电大学，计算机科学与技术专业硕士，15年IT从业经验，擅长数据密集型现代应用的架构设计与开发，对大数据，机器学习等技术有深入研究和实践经验。

字节跳动云原生 YARN 实践

邵凯阳

字节跳动火山引擎云原生计算研发工程师

字节跳动基础架构工程师，在字节跳动基础架构负责离线调度相关工作，具有多年工程架构经验。

ShardingSphere on Cloud：A developer's perspective

苗立尧

SphereEx Cloud Engineer, SphereEx

曾就职于日本 Netstars、蚂蚁金服和易宝支付。现任 SphereEx 云技术负责人，专注于为 ShardingSphere 构建云上解决方案。

如何构建大规模 API 中心

程小兰

深圳支流科技有限公司深圳支流科技有限公司后端开发工程师

程小兰，女，硕士，毕业于杭州电子科技大学计算机学院；服务端开发工程师，曾就职于快手科技，负责电商相关to B业务开发，现就职于深圳支流科技，负责网关平台开发；生活高度自律，工作认真负责。

Apache APISIX 助力企业 API 全生命周期管理

Yuansheng Wang

API7.ai(支流科技) API7.ai(支流科技) CTO

API7.ai co-founder & CTO, Apache APISIX PMC Member

Apache ShenYu 微服务网关百亿流量的实践落地

高向阳

北京转转精神科技有限公司北京转转精神科技有限公司资深研发工程师&Apache ShenYu Contributor

前闪送架构师，8年以上从业经验，中间件团队负责人，包括Dubbo服务治理，分布式消息平台，微服务网关平台，稳定性治理等。热爱架构，热爱分享，多次在行业大会进行主题分享。

An Introduction to the Kubernetes Gateway API with Apache APISIX

Navendu Pottekkat

Apache APISIX Apache APISIX Committer

Navendu Pottekkat is a maintainer of Apache APISIX and related open source projects. He helps new contributors to open source by mentoring through the Google Summer of Code and Linux Foundation Mentorship Program. Navendu writes and talks about the cloud native ecosystem and his experience in contributing to, building, scaling, and maintaining open source projects.

茶歇

事件驱动引擎RocketMQ EventBridge的设计与实现

陈永明

阿里云阿里云研发工程师

阿里云研发工程师，致力于消息中间件如RocketMQ、EventBridge的探索与开发，主持和维护了阿里云EventBridge大量生态集成相关工作。

RocketMQ 百万队列能力支持 -- RocksDB kv 存储

赵福建

阿里巴巴阿里巴巴高级开发工程师

2020年6月毕业于东南大学 2020年7月入职阿里巴巴

RocketMQ在小红书的特色实践

李亚斌

小红书小红书在线消息队列负责人

小红书在线消息队列负责人

茶歇

小米 RocketMQ 的降本增效和容灾实践

邓志文

小米小米软件研发工程师

Apache RocketMQ Committer，小米研发工程师，主要负责消息队列相关工作。

云原生消息流系统 Apache RocketMQ 在腾讯云的大规模生产实践

李伟

腾讯科技（成都）有限公司腾讯科技（成都）有限公司资深后端开发工程师

社区里大家叫小伟(tiger)，Apache RocketMQ北京社区联合发起人,RocketMQ社区Python项目负责人，RocketMQ项目Commiter，RocketMQ Exporter项目主要代码贡献者，Apache Doris Contributor，著有《RocketMQ分布式消息中间件：核心原理与最佳实践》。目前就职腾讯云消息队列团队，主要负责云上消息队列相关工作。分享一句话：技术无穷尽，真诚以待人，虚心以待学。

移动云MQTT-RocketMQ消息队列的海量数据流转实践