bigdata/drill.git - Apache Drill CI loop

Age	Commit message (Collapse)	Author
2019-03-08	DRILL-7072: Query with semi join fails for JDBC storage plugin	Volodymyr Vysotskyi
	closes #1674
2019-02-28	DRILL-1328: Support table statistics - Part 2	Gautam Parai
	Add support for avg row-width and major type statistics. Parallelize the ANALYZE implementation and stats UDF implementation to improve stats collection performance. Update/fix rowcount, selectivity and ndv computations to improve plan costing. Add options for configuring collection/usage of statistics. Add new APIs and implementation for stats writer (as a precursor to Drill Metastore APIs). Fix several stats/costing related issues identified while running TPC-H nad TPC-DS queries. Add support for CPU sampling and nested scalar columns. Add more testcases for collection and usage of statistics and fix remaining unit/functional test failures. Thanks to Venki Korukanti (@vkorukanti) for the description below (modified to account for new changes). He graciously agreed to rebase the patch to latest master, fixed few issues and added few tests. FUNCS: Statistics functions as UDFs: Separate Currently using FieldReader to ensure consistent output type so that Unpivot doesn't get confused. All stats columns should be Nullable, so that stats functions can return NULL when N/A. * custom versions of "count" that always return BigInt * HyperLogLog based NDV that returns BigInt that works only on VarChars * HyperLogLog with binary output that only works on VarChars OPS: Updated protobufs for new ops OPS: Implemented StatisticsMerge OPS: Implemented StatisticsUnpivot ANALYZE: AnalyzeTable functionality * JavaCC syntax more-or-less copied from LucidDB. * (Basic) AnalyzePrule: DrillAnalyzeRel -> UnpivotPrel StatsMergePrel FilterPrel(for sampling) StatsAggPrel ScanPrel ANALYZE: Add getMetadataTable() to AbstractSchema USAGE: Change field access in QueryWrapper USAGE: Add getDrillTable() to DrillScanRelBase and ScanPrel * since ScanPrel does not inherit from DrillScanRelBase, this requires adding a DrillTable to the constructor * This is done so that a custom ReflectiveRelMetadataProvider can access the DrillTable associated with Logical/Physical scans. USAGE: Attach DrillStatsTable to DrillTable. * DrillStatsTable represents the data scanned from a corresponding ".stats.drill" table * In order to avoid doing query execution right after the ".stats.drill" table is found, metadata is not actually collected until the MaterializationVisitor is used. Currently, the metadata source must be a string (so that a SQL query can be created). Doing this with a table is probably more complicated. Query is set up to extract only the most recent statistics results for each column. closes #729
2019-02-22	DRILL-6734: JDBC storage plugin returns null for fields without aliases	Volodymyr Vysotskyi
	closes #1642 - Add output column names to JdbcRecordReader and use them for storing the results since column names in result set may differ when aliases aren't specified
2019-01-10	DRILL-6955: storage-jdbc tests improvements	Volodymyr Vysotskyi
	- Remove plugins usage for instantiating test databases and tables - Replace derby with h2 database closes #1603
2018-12-20	DRILL-6915: Disable generation of test tables with case-sensitive names for ↵	Volodymyr Vysotskyi
	non-Linux systems closes #1580
2018-11-26	DRILL-6850: Allow configuring table names case sensitivity for JDBC storage ↵	Volodymyr Vysotskyi
	plugin closes #1542
2018-11-26	DRILL-6850: Force setting DRILL_LOGICAL Convention for DrillRelFactories and ↵	Volodymyr Vysotskyi
	DrillFilterRel - Fix workspace case insensitivity for JDBC storage plugin
2018-11-26	DRILL-6850: JDBC integration tests failures	Vitalii Diravka
	- Fix RDBMS integration tests (expected decimal output and testCrossSourceMultiFragmentJoin) - Update libraries versions - Resolve NPE for empty result
2018-10-14	DRILL-6777: Setup CircleCI configs for Drill	Vitalii Diravka
	- adding .circleci/config.yml to the project to launch CircleCI - custom memory parameters - usage of CircleCI machine - excluding "SlowTest" and "UnlikelyTest" groups - update maven version - adding libaio.so library to solve MySQL integration tests - update com.jcabi:jcabi-mysql-maven-plugin library version - TODO descriptions for the future enhancements of CircleCI build for Drill close apache/drill#1493
2018-10-01	DRILL-6724: Dump operator context to logs when error occurs during query ↵	Bohdan Kazydub
	execution closes #1455
2018-08-28	DRILL-6422: Replace guava imports with shaded ones	Volodymyr Vysotskyi

2018-08-28	DRILL-6422: Update guava to 23.0 and shade it	Volodymyr Vysotskyi
	- Fix compilation errors for new version of Guava. - Remove usage of deprecated API - Shade guava and add dependencies to the shaded version - Ban unshaded package - Introduce drill-shaded module and move guava-shaded under it - Add methods to convert shaded guava lists to the unshaded ones - Add instruction for publishing artifacts to the Apache repository
2018-08-27	DRILL-6647: Update Calcite version to 1.17.0	Volodymyr Vysotskyi
	closes #1425
2018-08-10	DRILL-6656: Disallow extra semicolons and multiple statements on the same line.	Timothy Farkas
	closes #1415
2018-07-31	DRILL-6617: Changing name of implicit RowId column from implicitColumn to ↵	Hanumath Rao Maduri
	implicitRIDColumn. closes #1401
2018-07-31	DRILL-6617: Planner Side changed to propagate $drill_implicit_field$ ↵	HanumathRao
	information.
2018-07-03	DRILL-6494: Drill Plugins Handler	Vitalii Diravka
	- Storage Plugins Handler service is used op the Drill start-up stage and it updates storage plugins configs from storage-plugins-override.conf file. If plugins configs are present in the persistence store - they are updated, otherwise bootstrap plugins are updated and the result configs are loaded to persistence store. If the enabled status is absent in the storage-plugins-override.conf file, the last plugin config enabled status persists. - 'drill.exec.storage.action_on_plugins_override_file' Boot option is added. This is the action, which should be performed on the storage-plugins-override.conf file after successful updating storage plugins configs. Possible values are: "none" (default), "rename" and "remove". - The "NULL" issue with updating Hive plugin config by REST is solved. But clients are still being instantiated for disabled plugins - DRILL-6412. - "org.honton.chas.hocon:jackson-dataformat-hocon" library is added for the proper deserializing HOCON conf file - additional refactoring: "com.typesafe:config" and "org.apache.commons:commons-lang3" are placed into DependencyManagement block with proper versions; correct properties for metrics in "drill-override-example.conf" are specified closes #1345
2018-06-15	DRILL-6455: Add missing JDBC Scan Operator for profiles	Kunal Khatua
	The operator is missing in the profile protobuf. This commit introduces that. 1. Added protobuf files (incl generated C++ and Java) 2. Updated JdbcSubScan's getOperatorType API closes #1297
2018-06-01	DRILL-6450: Visualized plans for profiles querying JDBC sources is broken	Kunal Khatua
	When viewing a profile for a query against a JDBC source, the visualized plan is not rendered. This is because the generated SQL pushed down to the JDBC source has a line break injected just before the FROM clause. The workaround is to strip away any injected newlines ('\\n') at least for the SQL defined in the text plan, so that the backend Javascript can render it correctly. In addition, any single line comments are also removed, but any block comments (i.e. /* .. */ ) are retained as they might carry hints. This closes #1295
2018-05-23	DRILL-6436: Storage Plugin to have name and context moved to ↵	chunhui-shi
	AbstractStoragePlugin closes #1282
2018-05-11	DRILL-6242 Use java.time.Local{Date\|Time\|DateTime} for Drill Date, Time, ↵	jiang-wu
	Timestamp types. (#3) close apache/drill#1247 * DRILL-6242 - Use java.time.Local{Date\|Time\|DateTime} classes to hold values from corresponding Drill date, time, and timestamp types. Conflicts: exec/java-exec/src/main/java/org/apache/drill/exec/vector/complex/fn/ExtendedJsonOutput.java Fix merge conflicts and check style.
2018-05-10	DRILL-6386: Remove unused imports and star imports.	Drill Dev

2018-05-04	DRILL-6094: Decimal data type enhancements	Volodymyr Vysotskyi
	Add ExprVisitors for VARDECIMAL Modify writers/readers to support VARDECIMAL - Added usage of VarDecimal for parquet, hive, maprdb, jdbc; - Added options to store decimals as int32 and int64 or fixed_len_byte_array or binary; Add UDFs for VARDECIMAL data type - modify type inference rules - remove UDFs for obsolete DECIMAL types Enable DECIMAL data type by default Add unit tests for DECIMAL data type Fix mapping for NLJ when literal with non-primitive type is used in join conditions Refresh protobuf C++ source files Changes in C++ files Add support for decimal logical type in Avro. Add support for date, time and timestamp logical types. Update Avro version to 1.8.2.
2018-04-17	DRILL-6320: Fixed license headers.	Drill Dev
	closes #1207
2018-04-13	DRILL-6294: Changes to support Calcite 1.16.0 , and remove deprecated API usage	Volodymyr Vysotskyi
	closes #1198
2018-03-04	DRILL-6189: Security: passwords logging and file permisions	Vladimir Tkach
	1. Overrided serialization methods for instances with passwords 2. Changed file permissions for configuration files closes #1139
2018-02-16	DRILL-6130: Fix NPE during physical plan submission for various storage plugins	Arina Ielchiieva
	1. Fixed ser / de issues for Hive, Kafka, Hbase plugins. 2. Added physical plan submission unit test for all storage plugins in contrib module. 3. Refactoring. closes #1108
2018-01-26	DRILL-5730: Mock testing improvements and interface improvements	Timothy Farkas
	closes #1045
2018-01-16	DRILL-3993: Changes after code review.	Volodymyr Vysotskyi

2018-01-16	DRILL-3993: Move Drill-specific commits 'CALCITE-628' and 'Drill-specific ↵	Volodymyr Vysotskyi
	change: Add back AbstractConverter in RelSet.java' from Calcite into DRILL
2018-01-16	DRILL-3993: Use custom RelBuilder implementation in rules	Volodymyr Vysotskyi
	After the changes, made in CALCITE-1056 if the filter has a predicate that is always false, RelBuilder.filter() method returns values rel node instead of filter rel node. In order to preserve column types, DrillRelBuilder.empty() method, which is returned by filter method was overridden, and now it returns filter with a false predicate. (advice to override this method was in its javadoc) The goal of all other changes in this commit is to use our custom RelBuilder for all rules that are used in Drill.
2018-01-16	DRILL-3993: Changes to support Calcite 1.13	Roman Kulyk
	- fixed all compiling errors (main changes were: Maven changes, chenges RelNode -> RelRoot, implementing some new methods from updated interfaces, chenges some literals, logger changes); - fixed unexpected column errors, validation errors and assertion errors after Calcite update; - fixed describe table/schema statement according to updated logic; - added fixes with time-intervals; - changed precision of BINARY to 65536 (was 1048576) according to updated logic (Calcite overrides bigger precision to own maxPrecision); - ignored some incorrect tests with DRILL-3244; - changed "Table not found" message to "Object not found within" according to new Calcite changes.
2017-10-04	DRILL-5752 this change includes:	Timothy Farkas
	1. Increased test parallelism and fixed associated bugs 2. Added test categories and categorized tests appropriately - Don't exclude anything by default - Increase test timeout - Fixed flakey test closes #940
2017-09-14	DRILL-5761: Disable Lilith ClassicMultiplexSocketAppender by default.	Volodymyr Vysotskyi
	Unify logback files.
2017-09-05	DRILL-5546: Handle schema change exception failure caused by empty input or ↵	Jinfeng Ni
	empty batch. 1. Modify ScanBatch's logic when it iterates list of RecordReader. 1) Skip RecordReader if it returns 0 row && present same schema. A new schema (by calling Mutator.isNewSchema() ) means either a new top level field is added, or a field in a nested field is added, or an existing field type is changed. 2) Implicit columns are presumed to have constant schema, and are added to outgoing container before any regular column is added in. 3) ScanBatch will return NONE directly (called as "fast NONE"), if all its RecordReaders haver empty input and thus are skipped, in stead of returing OK_NEW_SCHEMA first. 2. Modify IteratorValidatorBatchIterator to allow 1) fast NONE ( before seeing a OK_NEW_SCHEMA) 2) batch with empty list of columns. 2. Modify JsonRecordReader when it get 0 row. Do not insert a nullable-int column for 0 row input. Together with ScanBatch, Drill will skip empty json files. 3. Modify binary operators such as join, union to handle fast none for either one side or both sides. Abstract the logic in AbstractBinaryRecordBatch, except for MergeJoin as its implementation is quite different from others. 4. Fix and refactor union all operator. 1) Correct union operator hanndling 0 input rows. Previously, it will ignore inputs with 0 row and put nullable-int into output schema, which causes various of schema change issue in down-stream operator. The new behavior is to take schema with 0 into account in determining the output schema, in the same way with > 0 input rows. By doing that, we ensure Union operator will not behave like a schema-lossy operator. 2) Add a UnionInputIterator to simplify the logic to iterate the left/right inputs, removing significant chunk of duplicate codes in previous implementation. The new union all operator reduces the code size into half, comparing the old one. 5. Introduce UntypedNullVector to handle convertFromJson() function, when the input batch contains 0 row. Problem: The function convertFromJSon() is different from other regular functions in that it only knows the output schema after evaluation is performed. When input has 0 row, Drill essentially does not have a way to know the output type, and previously will assume Map type. That works under the assumption other operators like Union would ignore batch with 0 row, which is no longer the case in the current implementation. Solution: Use MinorType.NULL at the output type for convertFromJSON() when input contains 0 row. The new UntypedNullVector is used to represent a column with MinorType.NULL. 6. HBaseGroupScan convert star column into list of row_key and column family. HBaseRecordReader should reject column star since it expectes star has been converted somewhere else. In HBase a column family always has map type, and a non-rowkey column always has nullable varbinary type, this ensures that HBaseRecordReader across different HBase regions will have the same top level schema, even if the region is empty or prune all the rows due to filter pushdown optimization. In other words, we will not see different top level schema from different HBaseRecordReader for the same table. However, such change will not be able to handle hard schema change : c1 exists in cf1 in one region, but not in another region. Further work is required to handle hard schema change. 7. Modify scan cost estimation when the query involves * column. This is to remove the planning randomness since previously two different operators could have same cost. 8. Add a new flag 'outputProj' to Project operator, to indicate if Project is for the query's final output. Such Project is added by TopProjectVisitor, to handle fast NONE when all the inputs to the query are empty and are skipped. 1) column star is replaced with empty list 2) regular column reference is replaced with nullable-int column 3) An expression will go through ExpressionTreeMaterializer, and use the type of materialized expression as the output type 4) Return an OK_NEW_SCHEMA with the schema using the above logic, then return a NONE to down-stream operator. 9. Add unit test to test operators handling empty input. 10. Add unit test to test query when inputs are all empty. DRILL-5546: Revise code based on review comments. Handle implicit column in scan batch. Change interface in ScanBatch's constructor. 1) Ensure either the implicit column list is empty, or all the reader has the same set of implicit columns. 2) We could skip the implicit columns when check if there is a schema change coming from record reader. 3) ScanBatch accept a list in stead of iterator, since we may need go through the implicit column list multiple times, and verify the size of two lists are same. ScanBatch code review comments. Add more unit tests. Share code path in ProjectBatch to handle normal setupNewSchema() and handleNullInput(). - Move SimpleRecordBatch out of TopNBatch to make it sharable across different places. - Add Unit test verify schema for star column query against multilevel tables. Unit test framework change - Fix memory leak in unit test framework. - Allow SchemaTestBuilder to pass in BatchSchema. close #906
2017-03-29	DRILL-4678: Tune metadata by generating a dispatcher at runtime	Serhii-Harnyk
	main code changes are in Calcite library. update drill's calcite version to 1.4.0-drill-r20. close #793
2016-01-20	DRILL-4277: Fix for JdbcPrel serialization issue.	Jacques Nadeau
	This closes #326.
2015-12-14	DRILL-4198: Enhance StoragePlugin interface to expose logical space rules ↵	vkorukanti
	for planning purpose Also move Hive partition pruning rules to logical storage plugin rulesets. this closes #300
2015-12-02	DRILL-4124: Make all uses of AutoCloseables use addSuppressed exceptions to ↵	Julien Le Dem
	avoid noise in logs This closes #281
2015-11-13	DRILL-3791: MySQL tests for JDBC plugin	aleph-zero
	This commit adds integration tests for the JDBC plugin with MySQL. It also refactors the existing Derby tests to have the same general pattern as the MySQL tests: data is defined in an external .sql file and maven is used to start/stop external resources for testing. Add tests for ENUM and YEAR types. Tests for the CLOB type with Derby. This closes #251
2015-11-04	DRILL-4031: Log warning and ignore columns returned from JDBC source that ↵	Jacques Nadeau
	are unsupported. This closes #240
2015-11-01	DRILL-3956: Add support for MySQL text type	Jacques Nadeau

2015-11-01	DRILL-3992: Add/fix support for JDBC schemas (tested against oracle and derby)	Jacques Nadeau
	This closes #225
2015-10-26	DRILL-3742: Classpath scanning and build improvement	Julien Le Dem
	Makes the classpath scanning a build time class discovery Makes the fmpp generation incremental Removes some slowness in DrillBit closing Reduces the build time by 30% This closes #148
2015-10-07	DRILL-3791: Fix bugs in JDBC storage plugin	aleph-zero
	Fixes issues with bit, date, time and timestamp types in MySQL.
2015-09-17	DRILL-1942-readers:	Chris Westin
	- add extends AutoCloseable to RecordReader, and rename cleanup() to close(). - fix many warnings - formatting fixes DRILL-1942-readers: - renamed cleanup() to close in the new JdbcRecordReader Close apache/drill#154
2015-09-13	DRILL-3180: JDBC Storage Plugin updates.	Jacques Nadeau
	- Move to leverage Calcite's JDBC adapter capabilities for pushdowns, schema, etc. - Add test cases using Derby