Velox Type System

QUERYEXECUTION · SIMD · VELOX · VECTORIZED · DATA · PAPER2022-06-07

Velox Type System

2022-06-07#QueryExecution #SIMD #velox #vectorized #data #paper

What Is Velox Type System

Velox类型系统支持一部分与SQL兼容的可组合类型,scalar types(BOOLEAN, BIGINT等), complex types(ARRAY,MAP等),在开发者文档中也给出了Velox Scalar Type与C++ Type的对应关系,比如BOOLEAN -> bool、BIGINT -> int64_t。

Velox Type类

首先来看几个关键数据结构,Velox使用TypeKind做类型分类,ScalarType是一个template class来表达不同的scalar types,Complex types只能一个一个去定义,ARRAY类型就是ArrayType,MAP类型就是MapType。

enum class TypeKind : int8_t {
  BOOLEAN = 0,
  ...
  BIGINT = 4,
  ...
  ARRAY = 30,
  MAP = 31,
  ...
  OPAQUE = 35,
  INVALID = 36
};

template <TypeKind KIND>
class ScalarType : public TypeBase<KIND>
class ArrayType : public TypeBase<TypeKind::ARRAY>
class MapType : public TypeBase<TypeKind::MAP>

template <TypeKind KIND>
class TypeBase : public Type {
 public:
  using NativeType = TypeTraits<KIND>;
  bool isPrimitiveType() const override { return TypeTraits<KIND>::isPrimitiveType; }
  bool isFixedWidth() const override { return TypeTraits<KIND>::isFixedWidth; }
  const char* kindName() const override { return TypeTraits<KIND>::name; }
};

从上面代码可以看出,ScalarType,ArrayType都是TypeBase的子类,TypeBase提供了一些关键的类型信息,其native type,是不是primitive类型,是不是fix width,类型的name等,这些都是通过TypeTraits的模板特化“萃取”的,如下面代码所示,每个类型的信息都是有一个特化的类,BIGINT的native type是int64_t,ImplType也就是其velox type是ScalarTypeTypeKind::BIGINT,这里注意到ARRAY类型native type是void。

template <TypeKind KIND>
struct TypeTraits {};

template <>
struct TypeTraits<TypeKind::BIGINT> {
  using ImplType = ScalarType<TypeKind::BIGINT>;
  using NativeType = int64_t;
  using DeepCopiedType = NativeType;
  static constexpr uint32_t minSubTypes = 0;
  static constexpr uint32_t maxSubTypes = 0;
  static constexpr TypeKind typeKind = TypeKind::BIGINT;
  static constexpr bool isPrimitiveType = true;
  static constexpr bool isFixedWidth = true;
  static constexpr const char* name = "BIGINT";
};

template <>
struct TypeTraits<TypeKind::ARRAY> {
  using ImplType = ArrayType;
  using NativeType = void;
  using DeepCopiedType = void;
  static constexpr uint32_t minSubTypes = 1;
  static constexpr uint32_t maxSubTypes = 1;
  static constexpr TypeKind typeKind = TypeKind::ARRAY;
  static constexpr bool isPrimitiveType = false;
  static constexpr bool isFixedWidth = false;
  static constexpr const char* name = "ARRAY";
};

如何创建Velox Type

先来看一个例子,使用TypeFactory创建了一个BIGINT和一个ARRAY的velox,然后打印其velox类型名。BIGINT类型(包括所有scalar类型)实例是通过TypeTraits萃取到实际ScalarType的模板类型后调用create方法创建的,ARRAY类型实例是通过TypeFactory的特化实现的create方法创建的。

auto bigIntVT = TypeFactory<TypeKind::BIGINT>::create();
std::cout << "bigIntVT velox type name is " << bigIntVT->toString() << "\n";
auto arrayVT = TypeFactory<TypeKind::ARRAY>::create(bigIntVT);
std::cout << "arrayVT velox type name is " << arrayVT->toString() << "\n";
// Output:
// bigIntVT velox type name is BIGINT
// arrayVT velox type name is ARRAY<BIGINT>

template <TypeKind KIND>
struct TypeFactory {
  static std::shared_ptr<const typename TypeTraits<KIND>::ImplType> create() {
    return TypeTraits<KIND>::ImplType::create();
  }
};

template <TypeKind KIND>
const std::shared_ptr<const ScalarType<KIND>> ScalarType<KIND>::create() {
  static const auto instance = std::make_shared<const ScalarType<KIND>>();
  return instance;
}

template <>
struct TypeFactory<TypeKind::ARRAY> {
  static std::shared_ptr<const ArrayType> create(
      std::shared_ptr<const Type> elementType) {
    return std::make_shared<ArrayType>(std::move(elementType));
  }
};

如何从C++ Type推导和创建出Velox Type

有时候我们需要从C++类型推导并创建出Velox Type,比如下面这个SimpleVector的构造函数中的CppToType::create()需要从C++类型T推导出velox type和创建其实例。


// Constructs SimpleVector inferring the type from T.
  SimpleVector(
      velox::memory::MemoryPool* pool,
      std::shared_ptr<const Type> type,
      ...
      std::optional<ByteCount> storageByteCount = std::nullopt)
      : SimpleVector(
            pool,
            CppToType<T>::create(),
            ...

CppToType通过模板特化(这里是C++类型int64_t)推导出其父类CppToTypeBase的模板类型即velox type,CppToTypeBase是TypeTraits的子类,它调用其特化(这里是TypeKind::BIGINT)的create方法调用前面提到的TypeFactory的create方法,创建velox type的实例,CppToType<T>::isPrimitiveTypeCppToType<T>::typeKind亦是如此。

template <typename T>
struct CppToType {};

template <TypeKind KIND>
struct CppToTypeBase : public TypeTraits<KIND> {
  static auto create() {
    return TypeFactory<KIND>::create();
  }
};

template <>
struct CppToType<int64_t> : public CppToTypeBase<TypeKind::BIGINT> {};

template <typename ELEMENT>
struct CppToType<Array<ELEMENT>> : public TypeTraits<TypeKind::ARRAY> {
  static auto create() {
    return ARRAY(CppToType<ELEMENT>::create());
  }
};

std::shared_ptr<const ArrayType> ARRAY(
    std::shared_ptr<const Type> elementType) {
  return std::make_shared<const ArrayType>(std::move(elementType));
}

Rferences

Velox Type System
#QueryExecution #SIMD #velox #vectorized #data #paper · 2022-06-07