–map-column-hive is getting ignore in sqoop 1.4.6

January 22, 2019 ~ pramaraj0459 ~ Leave a comment

sqoop import -D mapred.job.queue.name=root.ficc.qar -D mapreduce.map.memory.mb=3072 -D mapreduce.map.java.opts=-Xmx3072m –connect “jdbc:oracle:thin:@servername.corp.company.com:40000/DB_ServiceName” –table EMPLOYEE –username USERNAME –password-file /user/id/pwd –split-by JOINDATE -m 1 –hive-import –hive-database hiveDB –hive-table EMPLOYEE –map-column-hive ID=INT64,NAME=VARCHAR,DEPARTMENT=VARCHAR,JOINDATE=TIMESTAMP –compress –compression-codec org.apache.hadoop.io.compress.SnappyCodec –as-parquetfile

Hive> describe Employee;

Name Type
1 id string
2 name string
3 department string
4 joindate bigint

Tableau: BIGINT to DATE Conversion

January 22, 2019January 22, 2019 ~ pramaraj0459 ~ Leave a comment

Used below calculation to convert BigINT to Date for a Impala table.
Originally this Impala table was loaded from Oracle table by Sqoop. Sqoop converts Oracle date fields to BigInt fields.

Note: Parquet doesn’t support date type but it does support timestamp. Sqoop has some bug so it cannot do timestamp but there is two step workaround for this that’s how i end up in using Spark.

dateadd(‘second’, INT(([Start Date]/1000)+(8*60*60)-43200), #1970-01-01#)

Source: https://community.tableau.com/thread/163721

Link to download different Cloudera distribution versions (CDH)

August 13, 2018August 13, 2018 ~ pramaraj0459 ~ Leave a comment

https://www.cloudera.com/downloads/cdh/5-3-2.html

Sqoop Error Message 2

July 31, 2018July 31, 2018 ~ pramaraj0459 ~ Leave a comment

ERROR:
ERROR org.apache.sqoop.tool.ImportTool – Import failed: Missing Hive MetaStore connection URI

Environment: While using Oozie in CDH5.14.2 Sqoop version 1.4.6

Cause: While doing Sqoop import with Parquet format throws this error in Sqoop version 1.4.6-CDH5.14.2.

Solution: To use other delimited format or by just leaving default format.

Sqoop Job Syntax with Incremental Load

July 31, 2018 ~ pramaraj0459 ~ Leave a comment

1. sqoop job \
2. –create incremental_import_job \
3. — import \
4. –connect jdbc:mysql://localhost/company \
5. –username root \
6. –password password \
7. –table newemployee \
8. –target-dir /data/newemployee \
9. –incremental append \
10. –check-column id \
11. –last-value 7 \
12. –m 1

Book ref – Hadoop: Data Processing and Modelling

This books looks good to learn more about data ingestion.

Sqoop Error Message 1

July 31, 2018 ~ pramaraj0459 ~ Leave a comment

ERROR:
ERROR org.apache.sqoop.tool.ImportTool – Import failed: There is no column found in the target table table_name. Please ensure that your table name is correct.

Cause: Sqoop is case sensitive with Oracle table names

Solution: TABLE_NAME instead of table_name

Note: Sqoop version 1.4.6