Tailored for online stores to drive new business growth
Personalized search with butler-like experience
Precise customer acquisition with reduced costs
Transforming content production from zero to one
Self-developed high-performance LLM inference framework, deeply optimized for DeepSeek series large language models. Integrated key solutions including PD separation, EPLB (Priority Scheduling), DeepEP (Efficient Parallel Execution), and DeepGEEM (Fine-grained Memory Management). Achieved over 50% increase in inference throughput in multi-card multi-node environments while significantly reducing ITL (Idle Token Latency), cutting overall latency by over 2x. Provides solid support for large model deployment in practical business scenarios.
ByteArk is a technology enterprise focused on AI infrastructure and application solutions, headquartered in Hangzhou Future Sci-Tech City. The company specializes in LLM inference framework optimization, industry-level AI application solutions, and high-performance GPU computing services, committed to building an integrated AI capability platform covering both underlying computing and business applications.
With an engineering culture at its core, 70% of the team are technical experts from renowned global universities and Fortune 500 tech companies, possessing strong R&D capabilities and international vision. ByteArk provides highly reliable and scalable AI computing resources to global clients. Recognized as a National High-Tech Enterprise and Zhejiang Province Specialized SME, ByteArk holds over a hundred patents and software copyrights, accelerating the construction of a global AI infrastructure service network.
Create tenfold value, take modest returns, give back to society
Create tenfold value, take modest returns, give back to society
"Entrepreneurship is like sailing - having a destination while discovering islands along the way." - CEO David. Successful serial entrepreneur and ByteArk founder. Began entrepreneurial journey in 2018 by establishing ByteArk. Previously served as IT engineer in semiconductor industry, specializing in smartphone projects.
Outstanding post-80s entrepreneur. Founded multiple successful businesses with products sold globally, averaging over $20M USD annual revenue. Entered blockchain field in 2014 as early evangelist and participant, focusing on trading and capital operations with extensive crypto asset experience. Invested in ByteArk in 2018, managing over $40M USD operational cash flow and $100M+ crypto assets.
专注于 推理执行阶段 本身的效率与执行路径优化,包括 Prefill/Decode 阶段的解耦、缓存调度、采样优化等。
1. 负责 LLM 推理系统的执行路径、资源调度与通信模块的系统级优化; 2. 设计并实现支持大规模多卡部署的调度执行架构,提升系统吞吐能力; 3. 优化通信链路与数据传输,减少跨节点通信延迟与带宽瓶颈; 4. 推进混合精度策略(如 FP16、BF16、INT8)在推理框架中的高效应用; 5. 支持并推动开源或自研推理框架(如 vLLM、SGLang)在系统层的深度性能演进。 职位要求: 1. 本科及以上学历,计算机科学、人工智能、软件工程或相关专业; 2. 熟悉主流推理框架,具备 vLLM、SGLang、TensorRT-LLM 等推理框架的优化经验者优先; 3. 熟悉通信优化,具备 NCCL、NVSHMEM、RDMA 等通信库的使用经验,了解通信开销的优化方法; 4. 理解资源管理机制,熟悉任务调度、并发控制、NUMA 架构、CPU/GPU 亲和性优化等系统层优化手段; 5. 具备系统级性能瓶颈分析能力,能够跨模块主导复杂性能问题的定位与解决,推动整体性能优化闭环。
关注推理框架本身的底层基础设施与系统结构,如资源分配、跨节点通信、GPU 编排、混合精度计算等。
1. 负责 LLM 推理系统的执行路径、资源调度与通信模块的系统级优化; 2. 设计并实现支持大规模多卡部署的调度执行架构,提升系统吞吐能力; 3. 优化通信链路与数据传输,减少跨节点通信延迟与带宽瓶颈; 4. 推进混合精度策略(如 FP16、BF16、INT8)在推理框架中的高效应用; 5. 支持并推动开源或自研推理框架(如 vLLM、SGLang)在系统层的深度性能演进。 职位要求: 1. 本科及以上学历,计算机科学、人工智能、软件工程或相关专业; 2. 熟悉主流推理框架,具备 vLLM、SGLang、TensorRT-LLM 等推理框架的优化经验者优先; 3. 熟悉通信优化,具备 NCCL、NVSHMEM、RDMA 等通信库的使用经验,了解通信开销的优化方法; 4. 理解资源管理机制,熟悉任务调度、并发控制、NUMA 架构、CPU/GPU 亲和性优化等系统层优化手段; 5. 具备系统级性能瓶颈分析能力,能够跨模块主导复杂性能问题的定位与解决,推动整体性能优化闭环。