Age | Commit message (Collapse) | Author |
|
does not return values is called
- Fix check for return function value to handle the case when created object is returned without assigning it to the local variable
closes #1687
|
|
- replaced all String path representation with org.apache.hadoop.fs.Path
- added PathSerDe.Se JSON serializer
- refactoring of DFSPartitionLocation code by leveraging existing listPartitionValues() functionality
closes #1657
|
|
1. Updated protobuf to version 3.6.1
2. Added protobuf to the root pom dependency management
3. Added classes BoundedByteString and LiteralByteString for compatibility with HBase
4. Added ProtobufPatcher to provide compatibility with MapR-DB and HBase
closes #1639
|
|
parquet reader is used
closes #1655
|
|
Add support for avg row-width and major type statistics.
Parallelize the ANALYZE implementation and stats UDF implementation to improve stats collection performance.
Update/fix rowcount, selectivity and ndv computations to improve plan costing.
Add options for configuring collection/usage of statistics.
Add new APIs and implementation for stats writer (as a precursor to Drill Metastore APIs).
Fix several stats/costing related issues identified while running TPC-H nad TPC-DS queries.
Add support for CPU sampling and nested scalar columns.
Add more testcases for collection and usage of statistics and fix remaining unit/functional test failures.
Thanks to Venki Korukanti (@vkorukanti) for the description below (modified to account for new changes). He graciously agreed to rebase the patch to latest master, fixed few issues and added few tests.
FUNCS: Statistics functions as UDFs:
Separate
Currently using FieldReader to ensure consistent output type so that Unpivot doesn't get confused. All stats columns should be Nullable, so that stats functions can return NULL when N/A.
* custom versions of "count" that always return BigInt
* HyperLogLog based NDV that returns BigInt that works only on VarChars
* HyperLogLog with binary output that only works on VarChars
OPS: Updated protobufs for new ops
OPS: Implemented StatisticsMerge
OPS: Implemented StatisticsUnpivot
ANALYZE: AnalyzeTable functionality
* JavaCC syntax more-or-less copied from LucidDB.
* (Basic) AnalyzePrule: DrillAnalyzeRel -> UnpivotPrel StatsMergePrel FilterPrel(for sampling) StatsAggPrel ScanPrel
ANALYZE: Add getMetadataTable() to AbstractSchema
USAGE: Change field access in QueryWrapper
USAGE: Add getDrillTable() to DrillScanRelBase and ScanPrel
* since ScanPrel does not inherit from DrillScanRelBase, this requires adding a DrillTable to the constructor
* This is done so that a custom ReflectiveRelMetadataProvider can access the DrillTable associated with Logical/Physical scans.
USAGE: Attach DrillStatsTable to DrillTable.
* DrillStatsTable represents the data scanned from a corresponding ".stats.drill" table
* In order to avoid doing query execution right after the ".stats.drill" table is found, metadata is not actually collected until the MaterializationVisitor is used.
** Currently, the metadata source must be a string (so that a SQL query can be created). Doing this with a table is probably more complicated.
** Query is set up to extract only the most recent statistics results for each column.
closes #729
|
|
close apache/drill#1629
|
|
1. HiveTestBase data initialization moved to static block
to be initialized once for all derivatives.
2. Extracted Hive driver and storage plugin management from HiveTestDataGenerator
to HiveTestFixture class. This increased cohesion of generator and
added loose coupling between hive test configuration and data generation
tasks.
3. Replaced usage of Guava ImmutableLists with TestBaseViewSupport
helper methods by using standard JDK collections.
closes #1613
|
|
binary table
1. Added persistence of MAP key and value types in Drill views (affects .view.drill file) for avoiding cast problems in future.
2. Preserved backward compatibility of older view files by treating untyped maps as ANY.
closes #1602
|
|
plugin when native reader is enabled
closes #1610
|
|
1. Added DrillHiveViewTable which allows construction of DrillViewTable based
on Hive metadata
2. Added initialization of DrillHiveViewTable in HiveSchemaFactory
3. Extracted conversion of Hive data types from DrillHiveTable
to HiveToRelDataTypeConverter
4. Removed throwing of UnsupportedOperationException from HiveStoragePlugin
5. Added TestHiveViewsSupport and authorization tests
6. Added closeSilently() method to AutoCloseables
closes #1559
|
|
closes #1575
|
|
closes #1586
|
|
|
|
|
|
DrillFilterRel
- Fix workspace case insensitivity for JDBC storage plugin
|
|
1. Added enableStringsSignedMinMax parquet format plugin config and store.parquet.reader.strings_signed_min_max session option to control reading binary statistics for files generated by prior versions of Parquet 1.10.0.
2. Added ParquetReaderConfig to store configuration needed during reading parquet statistics or files.
3. Provided mechanism to enable varchar / decimal filter push down.
4. Added VersionUtil to compare Drill versions in string representation.
5. Added appropriate unit tests.
closes #1537
|
|
closes #1527
|
|
Secondary Indexes
1. Index Planning Rules and Plan generators
- DbScanToIndexScanRule: Top level physical planning rule that drives index planning for several relational algebra patterns.
- DbScanSortRemovalRule: Physical planning rule for index planning for Sort-based operations.
- Plan Generators: Covering, Non-Covering and Intersect physical plan generators.
- Support planning with functional indexes such as CAST functions.
- Enhance PlannerSettings with several configuration options for indexes.
2. Index Selection and Statistics
- An IndexSelector that support cost-based index selection of covering and non-covering indexes using statistics and collation properties.
- Costing of index intersection for comparison with single-index plans.
3. Planning and execution operators
- Support RangePartitioning physical operator during query planning and execution.
- Support RowKeyJoin physical operator during query planning and execution.
- HashTable and HashJoin changes to support RowKeyJoin and Index Intersection.
- Enhance Materializer to keep track of subscan association with a particular rowkey join.
4. Index Planning utilities
- Utility classes to perform RexNode analysis, including conversion to and from SchemaPath.
- Utility class to analyze filter condition and an input collation to determine output collation.
- Helper classes to maintain index contexts for logical and physical planning phase.
- IndexPlanUtils utility class for various helper methods.
5. Miscellaneous
- Separate physical rel for DirectScan.
- Modify LimitExchangeTranspose rule to handle SingleMergeExchange.
- MD-3880: Return correct status from RangePartitionRecordBatch setupNewSchema
Co-authored-by: Aman Sinha <asinha@maprtech.com>
Co-authored-by: chunhui-shi <cshi@maprtech.com>
Co-authored-by: Gautam Parai <gparai@maprtech.com>
Co-authored-by: Padma Penumarthy <ppenumar97@yahoo.com>
Co-authored-by: Hanumath Rao Maduri <hmaduri@maprtech.com>
Conflicts:
exec/java-exec/src/main/java/org/apache/drill/exec/physical/config/HashJoinPOP.java
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/ScanBatch.java
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/common/HashPartition.java
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/common/HashTable.java
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/common/HashTableTemplate.java
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/HashJoinBatch.java
exec/java-exec/src/main/java/org/apache/drill/exec/planner/common/DrillRelOptUtil.java
exec/java-exec/src/main/java/org/apache/drill/exec/planner/fragment/Materializer.java
exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillMergeProjectRule.java
exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillOptiq.java
exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillPushProjectIntoScanRule.java
exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillScanRel.java
exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/BroadcastExchangePrel.java
exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/DrillDistributionTrait.java
exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/HashJoinPrel.java
exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/PrelUtil.java
exec/java-exec/src/main/java/org/apache/drill/exec/server/options/SystemOptionManager.java
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetPushDownFilter.java
exec/java-exec/src/main/resources/drill-module.conf
logical/src/main/java/org/apache/drill/common/logical/StoragePluginConfig.java
Resolve merge comflicts and compilation issues.
|
|
RootExec root wasn't initialized
closes #1506
|
|
close apache/drill#1307
|
|
execution
closes #1455
|
|
|
|
1. StoragePluginsRegistryImpl was updated:
a. for backward compatibility at init to convert all existing storage plugins names to lower case, in case of duplicates, to log warning and skip the duplicate.
b. to wrap persistent plugins registry into case insensitive store wrapper (CaseInsensitivePersistentStore) to ensure all given keys are converted into lower case when performing insert, update, delete, search operations.
c. to load system storage plugins dynamically by @SystemStorage annotation.
2. StoragePlugins class was updated to stored storage plugins configs by name in case insensitive map.
3. SchemaUtilities.searchSchemaTree method was updated to convert all schema names into lower case to ensure that are they are matched case insensitively (all schemas are stored in Drill in lower case).
4. FileSystemConfig was updated to store workspaces by name in case insensitive hash map.
5. All plugins schema factories are now extend AbstractSchemaFactory to ensure that given schema name is converted to lower case.
6. New method areTableNamesAreCaseInsensitive was added to AbstractSchema to indicate if schema tables names are case insensitive. By default, false. Schema implementation is responsible for table names case insensitive search in case it supports one. Currently, information_schema, sys and hive do so.
7. System storage plugins (information_schema, sys) were refactored to ensure their schema, table names are case insensitive, also the annotation @SystemPlugin and additional constructor were added to allow dynamically load system plugins at storage plugin registry during init phase.
8. MetadataProvider was updated to concert all schema filter conditions into lower case to ensure schema would be matched case insensitively.
9. ShowSchemasHandler, ShowTablesHandler, DescribeTableHandler were updated to ensure schema / tables names (this depends if schema supports case insensitive table names) would be found case insensitively.
git closes #1439
|
|
closes #1415
|
|
|
|
|
|
|
|
properties at session level
closes #1365
|
|
1. Check size in bytes presence in stats before fetching input splits and use it if present.
2. Add log trace suggesting to use ANALYZE command before running queries if statistics is unavailable and Drill had to fetch all input splits.
3. Minor refactoring / cleanup in HiveMetadataProvider class.
closes #1357
|
|
- Storage Plugins Handler service is used op the Drill start-up stage and it updates storage plugins configs from
storage-plugins-override.conf file. If plugins configs are present in the persistence store - they are updated,
otherwise bootstrap plugins are updated and the result configs are loaded to persistence store. If the enabled
status is absent in the storage-plugins-override.conf file, the last plugin config enabled status persists.
- 'drill.exec.storage.action_on_plugins_override_file' Boot option is added. This is the action, which should be
performed on the storage-plugins-override.conf file after successful updating storage plugins configs.
Possible values are: "none" (default), "rename" and "remove".
- The "NULL" issue with updating Hive plugin config by REST is solved. But clients are still being instantiated for disabled
plugins - DRILL-6412.
- "org.honton.chas.hocon:jackson-dataformat-hocon" library is added for the proper deserializing HOCON conf file
- additional refactoring: "com.typesafe:config" and "org.apache.commons:commons-lang3" are placed into DependencyManagement
block with proper versions; correct properties for metrics in "drill-override-example.conf" are specified
closes #1345
|
|
closes #1314
|
|
closes #1259
|
|
closes #1256
|
|
- Removed usages of System.out and System.err from the test and replaced with loggers
closes #1284
|
|
AbstractStoragePlugin
closes #1282
|
|
closes #1274
|
|
closes #1267
|
|
Timestamp types. (#3)
close apache/drill#1247
* DRILL-6242 - Use java.time.Local{Date|Time|DateTime} classes to hold values from corresponding Drill date, time, and timestamp types.
Conflicts:
exec/java-exec/src/main/java/org/apache/drill/exec/vector/complex/fn/ExtendedJsonOutput.java
Fix merge conflicts and check style.
|
|
|
|
Add ExprVisitors for VARDECIMAL
Modify writers/readers to support VARDECIMAL
- Added usage of VarDecimal for parquet, hive, maprdb, jdbc;
- Added options to store decimals as int32 and int64 or fixed_len_byte_array or binary;
Add UDFs for VARDECIMAL data type
- modify type inference rules
- remove UDFs for obsolete DECIMAL types
Enable DECIMAL data type by default
Add unit tests for DECIMAL data type
Fix mapping for NLJ when literal with non-primitive type is used in join conditions
Refresh protobuf C++ source files
Changes in C++ files
Add support for decimal logical type in Avro.
Add support for date, time and timestamp logical types.
Update Avro version to 1.8.2.
|
|
- Replacing com.codahale.metrics with last io.dropwizard.metrics Metrics for Drill
- com.yammer.metrics is removed, since isn't used directly by Drill
closes #1189
|
|
pruning
closes #1216
|
|
to Drill optimizations (filter / limit push down, count to direct scan)
1. Factored out common logic for Drill parquet reader and Hive Drill native parquet readers: AbstractParquetGroupScan, AbstractParquetRowGroupScan, AbstractParquetScanBatchCreator.
2. Rules that worked previously only with ParquetGroupScan, now can be applied for any class that extends AbstractParquetGroupScan: DrillFilterItemStarReWriterRule, ParquetPruneScanRule, PruneScanRule.
3. Hive populated partition values based on information returned from Hive metastore. Drill populates partition values based on path difference between selection root and actual file path.
Before ColumnExplorer populated partition values based on Drill approach. Since now ColumnExplorer populates values for parquet files from Hive tables,
`populateImplicitColumns` method logic was changed to populated partition columns only based on given partition values.
4. Refactored ParquetPartitionDescriptor to be responsible for populating partition values rather than storing this logic in parquet group scan class.
5. Metadata class was moved to separate metadata package (org.apache.drill.exec.store.parquet.metadata). Factored out several inner classed to improve code readability.
6. Collected all Drill native parquet reader unit tests into one class TestHiveDrillNativeParquetReader, also added new tests to cover new functionality.
7. Reduced excessive logging when parquet files metadata is read
closes #1214
|
|
closes #1207
|
|
closes #1158
|
|
|
|
closes #1146
|
|
closes #1140
|
|
2.1.2-mapr-1710 versions respectively
* Improvements to allow of reading Hive bucketed transactional ORC tables;
* Updating hive properties for tests and resolving dependencies and API conflicts:
- Fix for "hive.metastore.schema.verification", MetaException(message: Version information
not found in metastore) https://cwiki.apache.org/confluence/display/Hive/Hive+Schema+Tool
METASTORE_SCHEMA_VERIFICATION="false" property is added
- Added METASTORE_AUTO_CREATE_ALL="true", properties to tests, because some additional
tables are necessary in Hive metastore
- Disabling calcite CBO for (Hive's CalcitePlanner) for tests, because it is in conflict
with Drill's Calcite version for Drill unit tests. HIVE_CBO_ENABLED="false" property
- jackson and parquet libraries are relocated in hive-exec-shade module
- org.apache.parquet:parquet-column Drill version is added to "hive-exec" to
allow of using Parquet empty group on MessageType level (PARQUET-278)
- Removing of commons-codec exclusion from hive core. This dependency is
necessary for hive-exec and hive-metastore.
- Setting Hive internal properties for transactional scan:
HiveConf.HIVE_TRANSACTIONAL_TABLE_SCAN and for schema evolution: HiveConf.HIVE_SCHEMA_EVOLUTION,
IOConstants.SCHEMA_EVOLUTION_COLUMNS, IOConstants.SCHEMA_EVOLUTION_COLUMNS_TYPES
- "io.dropwizard.metrics:metrics-core" with last 4.0.2 version is added to dependencyManagement block in Drill root POM
- Exclusion of "hive-exec" in "hive-hbase-handler" is already in Drill root dependencyManagement POM
- Hive Calcite libraries are excluded (Calcite CBO was disabled)
- "jackson-core" dependency is added to DependencyManagement block in Drill root POM file
- For MapR Hive 2.1 client older "com.fasterxml.jackson.core:jackson-databind" is included
- "log4j:log4j" dependency is excluded from "hive-exec", "hive-metastore", "hive-hbase-handler".
close apache/drill#1111
|
|
closes #1122
|