Data Wrangling Engineer

Customer Success | San Francisco, CA | Full time

Job Description

 
About Trifacta:
Trifacta, the pioneer in data transformation, significantly enhances the value of an enterprise’s Big Data by enabling users to easily transform raw, complex data into clean and structured inputs for analysis. Leveraging decades of innovative work in human-computer interaction, scalable data management, and machine learning, Trifacta’s unique Predictive Interaction technology creates a bidirectional partnership between user and machine, with each component learning from the other and becoming smarter through use. Trifacta is backed by venture capital firms Accel, Greylock and Ignition Partners and is headquartered in San Francisco. Its founders and technical advisors include global leaders in data science, interaction design, and big data.
Who you are:
You create excitement and articulating path to success for customers through delivering projects, training and best practices. You have skills with Analytics Tools such as Excel, R, Python, data science workbench and knowledge of technology platforms in BigData, IoT, Analytics, DQ. You also have an understanding of the data and application integration markets, current market trends and industry players.
You are a team player and consultant with the willingness to work in cross-functional organizational structures and have excellent verbal and written communication along with good presentation skills. You have demonstrated the ability to thrive in a fast-growing dynamic company and proven ability to drive continuous value of your company’s product(s).

Skills and Qualifications
Bachelor's degree in a quantitative field with 5+ years of experience as professional services, pre-sales, technical architect or delivery manager or a Masters with 3+ years experience.

Required Skills
  • Strong knowledge of languages such as Python, R, MATLAB, Spark or SAS. R/Spark on Hadoop preferred.
  • Strong background in applying statistical machine learning techniques to predictive modeling and experience with Machine Learning libraries (via R, H2O, Python, Spark, etc).
  • Fluency in big data platforms including Hadoop, MapReduce, Hive, Spark.
  • A strong understanding of data profiling and data cleansing techniques.
  • Natural curiosity and a strong passion for empirical research and problem-solving.
  • Experience managing data pipelines - collecting, processing and analyzing large volumes of data
  • Interest in Data Science (Machine Learning/AI, Tensor Flow, Cluster Analysis, Decision Trees, ensemble methods, etc.)
  • Experience dealing with data at scale, processing and transforming hundreds of millions of data points per day.
  • Hands-on experience using SQL, ETL and relational databases.
  • Knowledge of dashboard and reporting software (Tableau, Power BI, etc.).
  • Good consulting and training skills.
  • Strong written and verbal communication skills; comfortable communicating with senior levels of both business and technology leadership.


Preferred skills
  • Familiarity with scheduling and orchestration tools (e.g. Airflow, Nifi, Streamsets, etc.)
  • Understanding and experience with version control software (e.g., Github)
  • Familiarity with Cloud-based HaaS/PaaS solutions such as AWS EMR, MS Azure.
  • Proficiency in consuming REST-based API (with JSON payload) is a plus.

Responsibilities:

Use case Development:
  • Perform data mining, profiling, mapping, quality assessments of several sources at customer projects..
  • Consultative approach with the customer for requirements, design and implementation of data use cases
  • Prepare data using Trifacta flows to develop analytics capabilities (e.g. models and processes) for customer use-cases
  • Prepare data using Trifacta flows for ML predictive and prescriptive modeling.
  • Preparation of data and wrangling workflow development to efficiently address customer use cases.
  • Leverage big data to discover patterns and solve strategic & tactical business problems using massive structured and unstructured data sets across multiple environments.
  • Design new or integrate existing tool(s) to automatically ingest, sort, tag, and organize various data sources and types according to a schema, methodology, or ontology.
  • Develop new or refine existing databases to ingest, sort, tag, and organize various data sources and types (structured and unstructured) according to an ontology to enable wider integration and analysis
  • Perform Exploratory data analysis for several data sources and projects.
  • Develop clear and compelling visualizations and dashboards to help business teams to make data-informed decisions.
  • Report findings by creating useful and appropriate data outputs and visualizations tailored for the intended audiences.

Training:
  • Differentiate our products in the marketplace through best-in-class training offerings.
  • Create and deliver onsite and remote workshops to help customers identify or build use cases in their organization around our products.
  • Deliver Training courses to individual customers or groups of customers/partners.
  • Staff the virtual offerings and group sessions scheduled on the website.
  • Learn & stay current on data science developments, news, opportunities, and challenges.

Community contribution:
  • Help in the development of documentation and best practices
  • Create referenceable go-lives and re-usable assets for the community
  • Generate innovative ideas, establish new research directions, shape and execute the information strategy in support of technical projects and new product developments.

Collaboration:
  • Maintain a cross-functional relationship with our CSM, Sales, Product, Marketing and Technical Support departments to continuously improve the product implementation and enhance customer success.
View All Jobs