aboutsummaryrefslogtreecommitdiff
path: root/contrib/storage-hive
AgeCommit message (Collapse)Author
2019-03-14DRILL-2326: Fix scalar replacement for the case when static method which ↵Volodymyr Vysotskyi
does not return values is called - Fix check for return function value to handle the case when created object is returned without assigning it to the local variable closes #1687
2019-03-05DRILL-5603: Replace String file paths to Hadoop PathVitalii Diravka
- replaced all String path representation with org.apache.hadoop.fs.Path - added PathSerDe.Se JSON serializer - refactoring of DFSPartitionLocation code by leveraging existing listPartitionValues() functionality closes #1657
2019-03-04DRILL-6642: Update protocol-buffers versionAnton Gozhiy
1. Updated protobuf to version 3.6.1 2. Added protobuf to the root pom dependency management 3. Added classes BoundedByteString and LiteralByteString for compatibility with HBase 4. Added ProtobufPatcher to provide compatibility with MapR-DB and HBase closes #1639
2019-03-01DRILL-6927: Avoid double conversion from impala timestamp when hive native ↵Volodymyr Vysotskyi
parquet reader is used closes #1655
2019-02-28DRILL-1328: Support table statistics - Part 2Gautam Parai
Add support for avg row-width and major type statistics. Parallelize the ANALYZE implementation and stats UDF implementation to improve stats collection performance. Update/fix rowcount, selectivity and ndv computations to improve plan costing. Add options for configuring collection/usage of statistics. Add new APIs and implementation for stats writer (as a precursor to Drill Metastore APIs). Fix several stats/costing related issues identified while running TPC-H nad TPC-DS queries. Add support for CPU sampling and nested scalar columns. Add more testcases for collection and usage of statistics and fix remaining unit/functional test failures. Thanks to Venki Korukanti (@vkorukanti) for the description below (modified to account for new changes). He graciously agreed to rebase the patch to latest master, fixed few issues and added few tests. FUNCS: Statistics functions as UDFs: Separate Currently using FieldReader to ensure consistent output type so that Unpivot doesn't get confused. All stats columns should be Nullable, so that stats functions can return NULL when N/A. * custom versions of "count" that always return BigInt * HyperLogLog based NDV that returns BigInt that works only on VarChars * HyperLogLog with binary output that only works on VarChars OPS: Updated protobufs for new ops OPS: Implemented StatisticsMerge OPS: Implemented StatisticsUnpivot ANALYZE: AnalyzeTable functionality * JavaCC syntax more-or-less copied from LucidDB. * (Basic) AnalyzePrule: DrillAnalyzeRel -> UnpivotPrel StatsMergePrel FilterPrel(for sampling) StatsAggPrel ScanPrel ANALYZE: Add getMetadataTable() to AbstractSchema USAGE: Change field access in QueryWrapper USAGE: Add getDrillTable() to DrillScanRelBase and ScanPrel * since ScanPrel does not inherit from DrillScanRelBase, this requires adding a DrillTable to the constructor * This is done so that a custom ReflectiveRelMetadataProvider can access the DrillTable associated with Logical/Physical scans. USAGE: Attach DrillStatsTable to DrillTable. * DrillStatsTable represents the data scanned from a corresponding ".stats.drill" table * In order to avoid doing query execution right after the ".stats.drill" table is found, metadata is not actually collected until the MaterializationVisitor is used. ** Currently, the metadata source must be a string (so that a SQL query can be created). Doing this with a table is probably more complicated. ** Query is set up to extract only the most recent statistics results for each column. closes #729
2019-02-01DRILL-7019: Add check for redundant importsVolodymyr Vysotskyi
close apache/drill#1629
2019-01-25DRILL-6977: Improve Hive tests configurationIgor Guzenko
1. HiveTestBase data initialization moved to static block to be initialized once for all derivatives. 2. Extracted Hive driver and storage plugin management from HiveTestDataGenerator to HiveTestFixture class. This increased cohesion of generator and added loose coupling between hive test configuration and data generation tasks. 3. Replaced usage of Guava ImmutableLists with TestBaseViewSupport helper methods by using standard JDK collections. closes #1613
2019-01-18DRILL-6944: UnsupportedOperationException thrown for view over MapR-DB ↵Igor Guzenko
binary table 1. Added persistence of MAP key and value types in Drill views (affects .view.drill file) for avoiding cast problems in future. 2. Preserved backward compatibility of older view files by treating untyped maps as ANY. closes #1602
2019-01-18DRILL-6969: Fix inconsistency of reading MaprDB JSON tables using hive ↵Volodymyr Vysotskyi
plugin when native reader is enabled closes #1610
2019-01-03DRILL-540: Allow querying hive views in DrillIgor Guzenko
1. Added DrillHiveViewTable which allows construction of DrillViewTable based on Hive metadata 2. Added initialization of DrillHiveViewTable in HiveSchemaFactory 3. Extracted conversion of Hive data types from DrillHiveTable to HiveToRelDataTypeConverter 4. Removed throwing of UnsupportedOperationException from HiveStoragePlugin 5. Added TestHiveViewsSupport and authorization tests 6. Added closeSilently() method to AutoCloseables closes #1559
2019-01-03DRILL-6907: Fix hive-exec-shaded classes recognition in IntelliJ IDEAVolodymyr Vysotskyi
closes #1575
2019-01-03DRILL-6929: Exclude maprfs jar for default profileVolodymyr Vysotskyi
closes #1586
2018-12-24[maven-release-plugin] prepare for next development iterationVitalii Diravka
2018-12-24[maven-release-plugin] prepare release drill-1.15.0Vitalii Diravka
2018-11-26DRILL-6850: Force setting DRILL_LOGICAL Convention for DrillRelFactories and ↵Volodymyr Vysotskyi
DrillFilterRel - Fix workspace case insensitivity for JDBC storage plugin
2018-11-15DRILL-6744: Support varchar and decimal push downArina Ielchiieva
1. Added enableStringsSignedMinMax parquet format plugin config and store.parquet.reader.strings_signed_min_max session option to control reading binary statistics for files generated by prior versions of Parquet 1.10.0. 2. Added ParquetReaderConfig to store configuration needed during reading parquet statistics or files. 3. Provided mechanism to enable varchar / decimal filter push down. 4. Added VersionUtil to compare Drill versions in string representation. 5. Added appropriate unit tests. closes #1537
2018-11-09DRILL-4456: Add Hive translate UDFVolodymyr Vysotskyi
closes #1527
2018-10-25DRILL-6381: (Part 3) Planner and Execution implementation to support ↵rebase
Secondary Indexes   1. Index Planning Rules and Plan generators     - DbScanToIndexScanRule: Top level physical planning rule that drives index planning for several relational algebra patterns. - DbScanSortRemovalRule: Physical planning rule for index planning for Sort-based operations.     - Plan Generators: Covering, Non-Covering and Intersect physical plan generators.     - Support planning with functional indexes such as CAST functions.     - Enhance PlannerSettings with several configuration options for indexes.   2. Index Selection and Statistics     - An IndexSelector that support cost-based index selection of covering and non-covering indexes using statistics and collation properties.     - Costing of index intersection for comparison with single-index plans.   3. Planning and execution operators     - Support RangePartitioning physical operator during query planning and execution.     - Support RowKeyJoin physical operator during query planning and execution.     - HashTable and HashJoin changes to support RowKeyJoin and Index Intersection.     - Enhance Materializer to keep track of subscan association with a particular rowkey join.   4. Index Planning utilities     - Utility classes to perform RexNode analysis, including conversion to and from SchemaPath.     - Utility class to analyze filter condition and an input collation to determine output collation.     - Helper classes to maintain index contexts for logical and physical planning phase.     - IndexPlanUtils utility class for various helper methods.   5. Miscellaneous     - Separate physical rel for DirectScan.     - Modify LimitExchangeTranspose rule to handle SingleMergeExchange. - MD-3880: Return correct status from RangePartitionRecordBatch setupNewSchema Co-authored-by: Aman Sinha <asinha@maprtech.com> Co-authored-by: chunhui-shi <cshi@maprtech.com> Co-authored-by: Gautam Parai <gparai@maprtech.com> Co-authored-by: Padma Penumarthy <ppenumar97@yahoo.com> Co-authored-by: Hanumath Rao Maduri <hmaduri@maprtech.com> Conflicts: exec/java-exec/src/main/java/org/apache/drill/exec/physical/config/HashJoinPOP.java exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/ScanBatch.java exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/common/HashPartition.java exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/common/HashTable.java exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/common/HashTableTemplate.java exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/HashJoinBatch.java exec/java-exec/src/main/java/org/apache/drill/exec/planner/common/DrillRelOptUtil.java exec/java-exec/src/main/java/org/apache/drill/exec/planner/fragment/Materializer.java exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillMergeProjectRule.java exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillOptiq.java exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillPushProjectIntoScanRule.java exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillScanRel.java exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/BroadcastExchangePrel.java exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/DrillDistributionTrait.java exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/HashJoinPrel.java exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/PrelUtil.java exec/java-exec/src/main/java/org/apache/drill/exec/server/options/SystemOptionManager.java exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetPushDownFilter.java exec/java-exec/src/main/resources/drill-module.conf logical/src/main/java/org/apache/drill/common/logical/StoragePluginConfig.java Resolve merge comflicts and compilation issues.
2018-10-19DRILL-6793: FragmentExecutor cannot send its final state for the case when ↵Bohdan Kazydub
RootExec root wasn't initialized closes #1506
2018-10-14DRILL-6473: Update MapR HiveBohdan Kazydub
close apache/drill#1307
2018-10-01DRILL-6724: Dump operator context to logs when error occurs during query ↵Bohdan Kazydub
execution closes #1455
2018-08-28DRILL-6422: Replace guava imports with shaded onesVolodymyr Vysotskyi
2018-08-27DRILL-6492: Ensure schema / workspace case insensitivity in DrillArina Ielchiieva
1. StoragePluginsRegistryImpl was updated: a. for backward compatibility at init to convert all existing storage plugins names to lower case, in case of duplicates, to log warning and skip the duplicate. b. to wrap persistent plugins registry into case insensitive store wrapper (CaseInsensitivePersistentStore) to ensure all given keys are converted into lower case when performing insert, update, delete, search operations. c. to load system storage plugins dynamically by @SystemStorage annotation. 2. StoragePlugins class was updated to stored storage plugins configs by name in case insensitive map. 3. SchemaUtilities.searchSchemaTree method was updated to convert all schema names into lower case to ensure that are they are matched case insensitively (all schemas are stored in Drill in lower case). 4. FileSystemConfig was updated to store workspaces by name in case insensitive hash map. 5. All plugins schema factories are now extend AbstractSchemaFactory to ensure that given schema name is converted to lower case. 6. New method areTableNamesAreCaseInsensitive was added to AbstractSchema to indicate if schema tables names are case insensitive. By default, false. Schema implementation is responsible for table names case insensitive search in case it supports one. Currently, information_schema, sys and hive do so. 7. System storage plugins (information_schema, sys) were refactored to ensure their schema, table names are case insensitive, also the annotation @SystemPlugin and additional constructor were added to allow dynamically load system plugins at storage plugin registry during init phase. 8. MetadataProvider was updated to concert all schema filter conditions into lower case to ensure schema would be matched case insensitively. 9. ShowSchemasHandler, ShowTablesHandler, DescribeTableHandler were updated to ensure schema / tables names (this depends if schema supports case insensitive table names) would be found case insensitively. git closes #1439
2018-08-10DRILL-6656: Disallow extra semicolons and multiple statements on the same line.Timothy Farkas
closes #1415
2018-07-25[maven-release-plugin] prepare for next development iterationBen-Zvi
2018-07-25[maven-release-plugin] prepare release drill-1.14.0Ben-Zvi
2018-07-19DRILL-6614: Allow usage of MapRDBFormatPlugin for HiveStoragePluginVitalii Diravka
2018-07-09DRILL-6575: Add store.hive.conf.properties option to allow set Hive ↵Arina Ielchiieva
properties at session level closes #1365
2018-07-04DRILL-6557: Use size in bytes during Hive statistics calculation if presentArina Ielchiieva
1. Check size in bytes presence in stats before fetching input splits and use it if present. 2. Add log trace suggesting to use ANALYZE command before running queries if statistics is unavailable and Drill had to fetch all input splits. 3. Minor refactoring / cleanup in HiveMetadataProvider class. closes #1357
2018-07-03DRILL-6494: Drill Plugins HandlerVitalii Diravka
- Storage Plugins Handler service is used op the Drill start-up stage and it updates storage plugins configs from storage-plugins-override.conf file. If plugins configs are present in the persistence store - they are updated, otherwise bootstrap plugins are updated and the result configs are loaded to persistence store. If the enabled status is absent in the storage-plugins-override.conf file, the last plugin config enabled status persists. - 'drill.exec.storage.action_on_plugins_override_file' Boot option is added. This is the action, which should be performed on the storage-plugins-override.conf file after successful updating storage plugins configs. Possible values are: "none" (default), "rename" and "remove". - The "NULL" issue with updating Hive plugin config by REST is solved. But clients are still being instantiated for disabled plugins - DRILL-6412. - "org.honton.chas.hocon:jackson-dataformat-hocon" library is added for the proper deserializing HOCON conf file - additional refactoring: "com.typesafe:config" and "org.apache.commons:commons-lang3" are placed into DependencyManagement block with proper versions; correct properties for metrics in "drill-override-example.conf" are specified closes #1345
2018-06-22DRILL-6454: Native MapR DB plugin support for Hive MapR-DB json tableVitalii Diravka
closes #1314
2018-06-13DRILL-6353: Upgrade Parquet MR dependenciesVlad Rozov
closes #1259
2018-06-07DRILL-6375 : Support for ANY_VALUE aggregate functionGautam Parai
closes #1256
2018-06-06DRILL-6438: Remove excess logging form the tests.Timothy Farkas
- Removed usages of System.out and System.err from the test and replaced with loggers closes #1284
2018-05-23DRILL-6436: Storage Plugin to have name and context moved to ↵chunhui-shi
AbstractStoragePlugin closes #1282
2018-05-18DRILL-6424: Updating FasterXML Jackson librariesVitalii Diravka
closes #1274
2018-05-18DRILL-6421: Refactor DecimalUtility and CoreDecimalUtility classesVolodymyr Vysotskyi
closes #1267
2018-05-11DRILL-6242 Use java.time.Local{Date|Time|DateTime} for Drill Date, Time, ↵jiang-wu
Timestamp types. (#3) close apache/drill#1247 * DRILL-6242 - Use java.time.Local{Date|Time|DateTime} classes to hold values from corresponding Drill date, time, and timestamp types. Conflicts: exec/java-exec/src/main/java/org/apache/drill/exec/vector/complex/fn/ExtendedJsonOutput.java Fix merge conflicts and check style.
2018-05-10DRILL-6386: Remove unused imports and star imports.Drill Dev
2018-05-04DRILL-6094: Decimal data type enhancementsVolodymyr Vysotskyi
Add ExprVisitors for VARDECIMAL Modify writers/readers to support VARDECIMAL - Added usage of VarDecimal for parquet, hive, maprdb, jdbc; - Added options to store decimals as int32 and int64 or fixed_len_byte_array or binary; Add UDFs for VARDECIMAL data type - modify type inference rules - remove UDFs for obsolete DECIMAL types Enable DECIMAL data type by default Add unit tests for DECIMAL data type Fix mapping for NLJ when literal with non-primitive type is used in join conditions Refresh protobuf C++ source files Changes in C++ files Add support for decimal logical type in Avro. Add support for date, time and timestamp logical types. Update Avro version to 1.8.2.
2018-04-29DRILL-6282: Update Drill's Metrics dependenciesVitalii Diravka
- Replacing com.codahale.metrics with last io.dropwizard.metrics Metrics for Drill - com.yammer.metrics is removed, since isn't used directly by Drill closes #1189
2018-04-29DRILL-6173: Support transitive closure during filter push down and partition ↵Vitalii Diravka
pruning closes #1216
2018-04-27DRILL-6331: Revisit Hive Drill native parquet implementation to be exposed ↵Arina Ielchiieva
to Drill optimizations (filter / limit push down, count to direct scan) 1. Factored out common logic for Drill parquet reader and Hive Drill native parquet readers: AbstractParquetGroupScan, AbstractParquetRowGroupScan, AbstractParquetScanBatchCreator. 2. Rules that worked previously only with ParquetGroupScan, now can be applied for any class that extends AbstractParquetGroupScan: DrillFilterItemStarReWriterRule, ParquetPruneScanRule, PruneScanRule. 3. Hive populated partition values based on information returned from Hive metastore. Drill populates partition values based on path difference between selection root and actual file path. Before ColumnExplorer populated partition values based on Drill approach. Since now ColumnExplorer populates values for parquet files from Hive tables, `populateImplicitColumns` method logic was changed to populated partition columns only based on given partition values. 4. Refactored ParquetPartitionDescriptor to be responsible for populating partition values rather than storing this logic in parquet group scan class. 5. Metadata class was moved to separate metadata package (org.apache.drill.exec.store.parquet.metadata). Factored out several inner classed to improve code readability. 6. Collected all Drill native parquet reader unit tests into one class TestHiveDrillNativeParquetReader, also added new tests to cover new functionality. 7. Reduced excessive logging when parquet files metadata is read closes #1214
2018-04-17DRILL-6320: Fixed license headers.Drill Dev
closes #1207
2018-03-20DRILL-6145: Implement Hive MapR-DB JSON handlerVitalii Diravka
closes #1158
2018-03-14Update version to 1.14.0-SNAPSHOTParth Chandra
2018-03-03DRILL-6204: Pass tables columns without partition columns to empty Hive readerArina Ielchiieva
closes #1146
2018-03-03DRILL-6195: Quering Hive non-partitioned transactional tables via DrillVitalii Diravka
closes #1140
2018-02-23DRILL-5978: Updating of Apache and MapR Hive libraries to 2.3.2 and ↵Vitalii Diravka
2.1.2-mapr-1710 versions respectively * Improvements to allow of reading Hive bucketed transactional ORC tables; * Updating hive properties for tests and resolving dependencies and API conflicts: - Fix for "hive.metastore.schema.verification", MetaException(message: Version information not found in metastore) https://cwiki.apache.org/confluence/display/Hive/Hive+Schema+Tool METASTORE_SCHEMA_VERIFICATION="false" property is added - Added METASTORE_AUTO_CREATE_ALL="true", properties to tests, because some additional tables are necessary in Hive metastore - Disabling calcite CBO for (Hive's CalcitePlanner) for tests, because it is in conflict with Drill's Calcite version for Drill unit tests. HIVE_CBO_ENABLED="false" property - jackson and parquet libraries are relocated in hive-exec-shade module - org.apache.parquet:parquet-column Drill version is added to "hive-exec" to allow of using Parquet empty group on MessageType level (PARQUET-278) - Removing of commons-codec exclusion from hive core. This dependency is necessary for hive-exec and hive-metastore. - Setting Hive internal properties for transactional scan: HiveConf.HIVE_TRANSACTIONAL_TABLE_SCAN and for schema evolution: HiveConf.HIVE_SCHEMA_EVOLUTION, IOConstants.SCHEMA_EVOLUTION_COLUMNS, IOConstants.SCHEMA_EVOLUTION_COLUMNS_TYPES - "io.dropwizard.metrics:metrics-core" with last 4.0.2 version is added to dependencyManagement block in Drill root POM - Exclusion of "hive-exec" in "hive-hbase-handler" is already in Drill root dependencyManagement POM - Hive Calcite libraries are excluded (Calcite CBO was disabled) - "jackson-core" dependency is added to DependencyManagement block in Drill root POM file - For MapR Hive 2.1 client older "com.fasterxml.jackson.core:jackson-databind" is included - "log4j:log4j" dependency is excluded from "hive-exec", "hive-metastore", "hive-hbase-handler". close apache/drill#1111
2018-02-19DRILL-6164: Heap memory leak during parquet scan and OOMVlad Rozov
closes #1122