PASS Summit 2010 Day 0 – PreCon – ETL with SSIS
PASS has officially started. For the second year in a row, I am attending Brian Knight’s (blog | twitter) Pre Conference. And as usual, Brian Knight – this time with Patrick LeBlanc (blog | twitter) – have delivered. Hats off to them, for making the pre-con very very informative, yet very entertaining.
Here are some of my notables from the session:
- Data flow most in memory, typically not CPU bound. Make sure you have lots a RAM
- Sort and aggregate are fully blocking asynchronous transforms. Union All is partially blocking.
- Don’t like waiting for BIDS to load up SSIS (or visual studio for that matter?) – use -NOSPLASH switch
- When you get problems migrating SSIS from dev to prod, ie if you are getting login errors, check package protection level. It might be tied to profile. Windows auth usually not a problem, but SQL auth usually fails. Check ProtectionLevel of the package – use EncryptSensitiveWithPassword or EncryptAllWithPassword
- No, you cannot put password in config file.
- Conditional Transform – “one package that does them all”. That is, same package for your “everyday” tasks, and “weekend” tasks, and “monthend” tasks etc
- Rule of thumb from Brian Knight – don’t choose “select table or view” in lookup transform. SSIS will cache the whole thing. Selectively choose which columns you want.
- Control flow orchestrates everything that happens in a package.
- Always be careful with variable scope. Always go back to control flow before creating it if you want package level scope. @simon_sabin and @bo_altes also recommended BIDSHelper.
- Data Profiling task – good for analyzing new projects
- Interesting use of fuzzy lookups/techniques – scrubbing tweets about your company. How much “buzz” is your company creating?
- Fuzzy techniques – use to dedupe, catch misspelt names. One cust saved 800k.
- Fuzzy techniques cons – slow, CPU intensive, heavy use of tempdb. Make sure tempdb is tuned.
- Gotcha on term extraction – must be Unicode
- Change data capture (CDC) is built on top of replication. If u enable CDC, and if you already have replication, u may experience latency
- Compression good for fact tables because queried the most, has most repeating data. Stands to gain most.
- Once you pick a language in the SSIS script editor, you can’t change it anymore. If you really want to, you will have to drop the script task and recreate.
- Set RetainSameConnection property = true
- SCD Wizard pro – very short design time. Con – can have long run times.
- Another con for SCD wizard. If you need to change it will lose your customizations. Tip: disconnect the precedence line before updating
- CHECKSUM vs HASHBYTES – CHECKSUM pseudo unique. HASHBYTES really unique
- FastParse hidden property for textfiles – great option if you trust your data. @SQLRunner Josef Richberg warns of painful gotchas.
- Can change connection settings default packet size from 4k to 32k. Be careful, make sure you watch your network saturation; also watch for jumbo packets. Your network may not allow bigger packet sizes. Talk to your network admin.
- SortKeyPosition property – 1 for ASC, -1 for DESC
- Destination tuning: check out WSRM windows system resource manager.. This allows to create soft throttle.
- When to log – onerror, onwarning, onpreexecute, onpostvalidate, onpipelinesrowsent
- Every component has an advanced editor
- Tuning : use parallel design. Spawn simultaneous DTEXEC processes. Be wary though of point of diniminishing returns.
- Is it a memory problem? Check perfmon counters BuffersInUse and BufferSpoolerd
- Clustering in SSIS is not technically supported
- To create additional folders when deploying – SQLServer InstallPath100DTSBINNMsDtsSrvr.ini.xml — this is where you can add new folders
- Regarding 64 bit issue: Jet Drivers must be 32 bit; set Execution option Run64bitRunTime mode = true if you have Excel, Access, DLL files
- PragmaticWorks SSIS Expression CheatSheet
- Free CheckSum Transform
- BI WhitePaper Links (via @SQLChicken) http://www.sqlcrunch.com/DataWarehousingBI/tabid/105/Default.aspx
- Kimball SCD Component on CodePlex
- We Loaded 1TB in 30 Minutes with SSIS, and So Can You
- Fast Track reference White Paper
- OH: “I love hearing my design decisions made over the past yr vindicated by people smarter than me. Wish my boss was here”
- @BrianKnight “I’m like a trace flag encyclopedia”
- @BrianKnight “Great point. It depends. What kind of consultant would I be if I didn’t say ‘it depends’?”
- OH: “Msdtssrvr.ini.xml – worth the price of admission”
- @PatrickDBA on SSIS Expression Language: “t-sql and C# had a baby, threw it up the attic, and just came out and barely speaks English”
- OH: “Have oracle source, doing fast load but still slow? Solution? Use oracle driver from Rracle. Better solution: Convert to SQL Server”
- @PatrickDBA – “Control flow is like me, a big deal”
- @BrianKnight – “Kinda looks like a star” @PatrickDBA – “Like me!”