Hello Everyone! I'm going to be talking seeding in Rails, it's something which we all probably know about but most the projects we're using probably are have empty files.
I'm going to throw some code at you during this talk! I'll drop links afterwards, but you can find them here. Also this talk is up on GitHub & written in markdown, if you want to take it & change it you can :D
This is what I'm going to be covering in this. The idea is by the end of it, you'll want to take any existing/new project and just add a few seeds. Plus I have some cool tricks to get a lot of value from your seed data.
Could be a user, could be Tax Rates or chucks of fixed data. The idea is that is development, or a preview environment we have a good representation of production so we can build a better product. And in production, environments we use them to make sure we have data we expect to be there (e.g. Tax Rates) & it's the same across all environments.
Rails has some built in commands to make this easier for us, and also their is the awesome bin/setup file. Recently I've been getting into the mindset that every time a developer changes branches, they should feel confident in the `bin/setup` file to rebuild a fresh environment. I think that's something awesome to aim for.
They also live here:
The default file sucks so hard. I hate it. I don't think you'd ever want your seeds to look like this at all.
If I'm lucky they look like this. Where there is actually something a bit more useful going on.
Maybe you'll also get this, or just a blank file. You might also get something a bit suspect. This is taken from a real project I worked on. I wish I had written that comment, but I didn't. In the production DB that value is different now. They we're made years ago & were to much of a hassle to keep up to date.
First story: This was a few years ago now, but a developer was sent a copy of the production database to use locally. Then someone broke into their office on a weekend (Smashed the door in) and stole their laptop. They had to tell the customer about it (Code + DB lost) & it was very uncomfortable. As we all work from home, this one is becoming more important to think about. Aside: I heard that Facebook the developers dev environment is in a cloud machine the SSH into. I'm super excited about GitHub Codespaces for not having the codebase/DB on local machines, but instead of cloud machine which is destroyed at the end of each session.
Second story: - Preview environment didn't have basic auth on it & passwords were the same. - The preview environments ran the cronjobs same as production - The dataset was an anonymised snapshot of production, often a few months behind (It would take a while to run & was very manual). - The preview environment used MailTrap to stop emails going out. But one day our client wanted to see the emails in their inbox. So some emails weren't anonymised in the snapshot. Then one day the client said "I want an exact copy of production in staging". Lots of whoopsies there, but if we had better process the preview environments sample data, it would have been avoidable.
You always kind of know a project is going to be rubbish when you need to open console to create a user to login with. It's kind of nice to go through a local application with a developer & show them lots of pages with real feeling content on it. Plus it's even better if there are 100s of records to make N+1s obvious. This is something I'm quite passionate about, like if we want people to really enjoy working with Rails we need to care about this kind of low hanging fruit.
Before we continue! Let's check the vibes of the room! Throw up some emojis!
Who has picked up a project where it was just blank? I feel it's pretty common.
I used to work in agencies quite a lot & it was pretty common to just take a copy of the production DB.
I don't think I've ever worked on a project which was amazing. I'm going to show you a very cool trick I've started doing which I think is the way to go.
So know we know the pain we're trying to avoid, how do we do it?
These are the ones you'd find in you `db/seeds.rb` file. The files can become very big pretty fast. I've seen them broken up into smaller files before Sometimes they might if statements to handles different environments and whatnot.
We all know faker? I find fuzz testing in development is kind of nice to find weird use cases. If I don't use Faker, I normally end up rolling my face over the keyboard. It make me think about the HTML components I'm designing.
I like this, especially in development environments
You can also take it to the next level, and use the data you'll use in your tests to the mess with your app. When I was experimenting with this, I found this command.
So I wondered what would happen if I just used these instead! I generally have pretty flushed out factories, which are good representations of my expected data.
I'm really liking this approach! ThoughtBot doesn't encourage this, but I've been using it a little & I really like it. The happy side effect I'm noticing it is: - Easier to write tests, as I'm looking at what the test will see. - I'm more incentivised to make working factories. I think if your app is simple, and you're starting fresh this could be a valid approach.
But what if you're picking up a project where there is nothing? Or the database is intense! Evil Martians have this pretty cool library for taking snapshots of user data & the relationships. So potentially you could automate a subset of data to be available for your local development & review apps. Potentially you could also run this in a way where you can log who is requesting what data, weather via version control or something else. I started using this in a project with a 8GB database recently which had a real mess of a database. We ended up using it to create a dump of our DB, which took about ~20 minutes to run & then we used our local machines.
Sometimes I find data is the same in all environments, but is in the database. So lets say you have a user with a Plan relationship. The plans are all the same for each environment.
So what if we pulled all the data from that table & put it into a Struct Along with some helper methods
Then rewrote our model to look like this. We avoid having to touch the database for this information, plus it's now the same across all environments. Plus all the data changes are in version control.
DO IT!! I started just adding a file where I'd call the load_seed function with the various ENVs which do stuff & I'd just check stuff happens in my database. It doesn't have to be anything fancy, but it will pick up on any whoopsies. I seriously love this test in my code, it picks up some really random mistakes sometimes.
Plus as I use FactoryBot for my seeds right now, I often call the linter as part of the test suite. It's a really cool! From Factories I'm going to write anyway, I'm able to use them more effectively.
How do you know when we've made good seeds? Most developers won't tell you something is wrong, we're all frogs will sit in boiling water sometimes.
Instead, look at measurable behaviour! - Developers touching prod is bad - Making problems obvious quickly is a win. - Are developers committing schema changes which aren't real! - This is more for Product Owners, but are they going to preview environments & looking at stuff? Or are they testing in prod?
So what should we be doing? So you should be using seeds, they're a good foundation for any app.
Make a project better