Velox Type System

What Is Velox Type System

Velox类型系统支持一部分与SQL兼容的可组合类型,scalar types(BOOLEAN, BIGINT等), complex types(ARRAY,MAP等),在开发者文档中也给出了Velox Scalar Type与C++ Type的对应关系,比如BOOLEAN -> bool、BIGINT -> int64_t。

Velox Type类

首先来看几个关键数据结构,Velox使用TypeKind做类型分类,ScalarType是一个template class来表达不同的scalar types,Complex types只能一个一个去定义,ARRAY类型就是ArrayType,MAP类型就是MapType。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
enum class TypeKind : int8_t {
BOOLEAN = 0,
...
BIGINT = 4,
...
ARRAY = 30,
MAP = 31,
...
OPAQUE = 35,
INVALID = 36
};

template <TypeKind KIND>
class ScalarType : public TypeBase<KIND>
class ArrayType : public TypeBase<TypeKind::ARRAY>
class MapType : public TypeBase<TypeKind::MAP>

template <TypeKind KIND>
class TypeBase : public Type {
public:
using NativeType = TypeTraits<KIND>;
bool isPrimitiveType() const override { return TypeTraits<KIND>::isPrimitiveType; }
bool isFixedWidth() const override { return TypeTraits<KIND>::isFixedWidth; }
const char* kindName() const override { return TypeTraits<KIND>::name; }
};

从上面代码可以看出,ScalarType,ArrayType都是TypeBase的子类,TypeBase提供了一些关键的类型信息,其native type,是不是primitive类型,是不是fix width,类型的name等,这些都是通过TypeTraits的模板特化“萃取”的,如下面代码所示,每个类型的信息都是有一个特化的类,BIGINT的native type是int64_t,ImplType也就是其velox type是ScalarTypeTypeKind::BIGINT,这里注意到ARRAY类型native type是void。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
template <TypeKind KIND>
struct TypeTraits {};

template <>
struct TypeTraits<TypeKind::BIGINT> {
using ImplType = ScalarType<TypeKind::BIGINT>;
using NativeType = int64_t;
using DeepCopiedType = NativeType;
static constexpr uint32_t minSubTypes = 0;
static constexpr uint32_t maxSubTypes = 0;
static constexpr TypeKind typeKind = TypeKind::BIGINT;
static constexpr bool isPrimitiveType = true;
static constexpr bool isFixedWidth = true;
static constexpr const char* name = "BIGINT";
};

template <>
struct TypeTraits<TypeKind::ARRAY> {
using ImplType = ArrayType;
using NativeType = void;
using DeepCopiedType = void;
static constexpr uint32_t minSubTypes = 1;
static constexpr uint32_t maxSubTypes = 1;
static constexpr TypeKind typeKind = TypeKind::ARRAY;
static constexpr bool isPrimitiveType = false;
static constexpr bool isFixedWidth = false;
static constexpr const char* name = "ARRAY";
};

如何创建Velox Type

先来看一个例子,使用TypeFactory创建了一个BIGINT和一个ARRAY的velox,然后打印其velox类型名。BIGINT类型(包括所有scalar类型)实例是通过TypeTraits萃取到实际ScalarType的模板类型后调用create方法创建的,ARRAY类型实例是通过TypeFactory的特化实现的create方法创建的。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
auto bigIntVT = TypeFactory<TypeKind::BIGINT>::create();
std::cout << "bigIntVT velox type name is " << bigIntVT->toString() << "\n";
auto arrayVT = TypeFactory<TypeKind::ARRAY>::create(bigIntVT);
std::cout << "arrayVT velox type name is " << arrayVT->toString() << "\n";
// Output:
// bigIntVT velox type name is BIGINT
// arrayVT velox type name is ARRAY<BIGINT>

template <TypeKind KIND>
struct TypeFactory {
static std::shared_ptr<const typename TypeTraits<KIND>::ImplType> create() {
return TypeTraits<KIND>::ImplType::create();
}
};

template <TypeKind KIND>
const std::shared_ptr<const ScalarType<KIND>> ScalarType<KIND>::create() {
static const auto instance = std::make_shared<const ScalarType<KIND>>();
return instance;
}

template <>
struct TypeFactory<TypeKind::ARRAY> {
static std::shared_ptr<const ArrayType> create(
std::shared_ptr<const Type> elementType) {
return std::make_shared<ArrayType>(std::move(elementType));
}
};

如何从C++ Type推导和创建出Velox Type

有时候我们需要从C++类型推导并创建出Velox Type,比如下面这个SimpleVector的构造函数中的CppToType::create()需要从C++类型T推导出velox type和创建其实例。

1
2
3
4
5
6
7
8
9
10
11

// Constructs SimpleVector inferring the type from T.
SimpleVector(
velox::memory::MemoryPool* pool,
std::shared_ptr<const Type> type,
...
std::optional<ByteCount> storageByteCount = std::nullopt)
: SimpleVector(
pool,
CppToType<T>::create(),
...

CppToType通过模板特化(这里是C++类型int64_t)推导出其父类CppToTypeBase的模板类型即velox type,CppToTypeBase是TypeTraits的子类,它调用其特化(这里是TypeKind::BIGINT)的create方法调用前面提到的TypeFactory的create方法,创建velox type的实例,CppToType<T>::isPrimitiveTypeCppToType<T>::typeKind亦是如此。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
template <typename T>
struct CppToType {};

template <TypeKind KIND>
struct CppToTypeBase : public TypeTraits<KIND> {
static auto create() {
return TypeFactory<KIND>::create();
}
};

template <>
struct CppToType<int64_t> : public CppToTypeBase<TypeKind::BIGINT> {};

template <typename ELEMENT>
struct CppToType<Array<ELEMENT>> : public TypeTraits<TypeKind::ARRAY> {
static auto create() {
return ARRAY(CppToType<ELEMENT>::create());
}
};

std::shared_ptr<const ArrayType> ARRAY(
std::shared_ptr<const Type> elementType) {
return std::make_shared<const ArrayType>(std::move(elementType));
}

Rferences