<tbody id="8gsce"></tbody>

<acronym id="gea6w"></acronym>

Welcome:Beijing Plink Ai Technology Co.,LTD.Service Hotline:+86-400-127-3302

Language:

∷

News

How to break the bottleneck of Decoder performance? Nvidia experts reveal the secrets

Since "Attention is All You Need" was introduced in 2017, Transformer has become a very popular deep learning network architecture in the NLP space. However, in the inference deployment phase, its computing performance often fails to meet the requirements of low latency and high throughput for online services.

In Nvidia's open-source FasterTransformer 1.0 version, the Transformer Encoder in BERT has been optimized and accelerated to reduce the latency of coding with transformer.

Having solved the Encoder performance problem, Nvidia has focused on the equally important Transformer Decoder reasoning.

As a result, Nvidia has introduced version 2.0 of FasterTransformer, which offers a transformer layer that is highly optimized for decoders. At the same time, the optimized translation process is also provided to meet the needs of users who want to significantly reduce latency in translation scenarios.

PREVIOUS：Nvidia 1000RMB graphics card GTX1650 in AprilNEXT：Nvidia's New Software Will Help Chipmakers Pr

LATEST NEWS

热视频精品在线视频精品,日本午夜免a费看大片中文4,在线观看亚洲无码sv,免费国产无码不卡免费无码一区二区精品视频永久免费无码国产

<tbody id="8gsce"></tbody>

News

How to break the bottleneck of Decoder performance? Nvidia experts reveal the secrets

RELATED NEWS

CATEGORIES

LATEST NEWS