Table of Contents
At the article Rails 4 : Load Seed from TSV, I wrote about how to inject tsv seed data into your db. This time, I wrote about how to inject not only tsv, also csv seed data into your db.
Namely, I created CSV class like TSV class.
Environment
- Ubuntu 14.04 LTS
- Rails 4.1.8
- Ruby 2.2.2
Usage
The usage is the same as one of the before article.
Instruction
rake seed_file:load TSV=aaa,bbb,ccc/ddd
loads aaa.tsv
, bbb.tsv
and ccc/ddd.tsv
in this order. And it requires class Aaa
, Bbb
and Ccc::Ddd
, if you don’t write special code.
rake seed_file:load
doesn’t load all files under db/seeds/xsv
.
Attention
- Locate TSV and CSV at the directory
db/seeds/xsv
, and name it like [snake case of the correspondent model] + “.tsv”. - The first line of TSV and CSVTSV should contain column names, and data should start with second line. 存在しないカラムが記述されている場合、そのカラムはデータベースに取り込まれません。
_memo
という存在しない名前のカラムを作って、備考として利用することも可能です。 - Don’t change id value which have been loaded once, if the TSV or CSV contains id. id をもとに データ の新規作成・更新を行っているので、 id が変更されると意図せぬデータができます。
- You can’t delete data after loading.
Program
First, create task file.
The following program creates TSV
and CSV
object first, and read each line. It enables to load big file because it doesn’t load all TSV and CSV lines into RAM at once. In ruby, require 'csv'
gives us CSV Class that can handle CSV and TSV, but I wanted hash of column name and value, and didn’t want to write TSV or CSV handling process in main procedure, so I created TSV and CSV Class.
Task Code
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 |
namespace :seed_file do # class to handle CSV file class XSV XSV_ROOT = Rails.root.join('db', 'seeds', 'xsv') # Yield each line as hash. # ==== Example # instance.each{|values| # value = values['column_name'] # } # ==== Note # This method is usefull when handle too big tsv file, # because this method doesn't load all tsv contents into memory, # handle each line. def each(&block) File.open(@file_path) do |tsv| columns = self.class.split_line(tsv.readline) while !tsv.eof values = self.class.split_line(tsv.readline) kv = {} columns.each_with_index{|column_name, index| kv[column_name] = values[index] } yield kv end end end # Initialize. # first column of xsv file should be column name array # ==== Parameters # * +file_name+ - xsv file name except extension def initialize(file_name) @file_path = File.join(XSV_ROOT, file_name) end protected # Split tab separated value string into array # ==== Parmeter # * +line+ - tab separated value string def self.split_line(line) raise StandardException.new('Implement split_line function!'); end end # class to handle CSV file class CSV < XSV protected # Split comma separated value string into array # ==== Parmeter # * +line+ - tab separated value string def self.split_line(line) return line.chomp.split(",") end end # class to handle TSV file class TSV < XSV protected # Split tab separated value string into array # ==== Parmeter # * +line+ - tab separated value string def self.split_line(line) return line.chomp.split("t") end end class XSVFactory def self.create_xsv(file_name) case file_name[-3, 3].downcase when 'csv' return CSV.new(file_name) when 'tsv' return TSV.new(file_name) else raise StandardException.new('File format is invalid.') end end end # Load xsv file without transaction. # ==== Parameter # * +file_key_name+ - file key name, it can be leveled_experience_table or # table name. # * +type+ - `csv` or `tsv` def load_xsv(file_name) xsv = XSVFactory.create_xsv(file_name) file_key_name = file_name[0, file_name.length - 4] case file_name when 'japan/food_maker.csv', 'japan/railway_company.csv', 'america/paper_company.csv' load_company_csv(xsv, file_key_name) else load_table_xsv(xsv, file_key_name) end end # Load tsv file with transaction def load_xsv_aspect(file_key_name) Rails.logger.info('start to load ' << file_key_name) ActiveRecord::Base.transaction do load_xsv(file_key_name) end Rails.logger.info('end to load ' << file_key_name) end def load_table_xsv(xsv, file_key_name) model_class = file_key_name.classify.constantize xsv.each{|values| model = model_class.find_or_initialize_by(id: values['id']) values.each{|key, value| if model.has_attribute?(key) model[key] = value end } model.save! } end def load_company_csv(xsv, file_key_name) model_class = file_key_name.classify.constantize xsv.each{|values| model = model_class.find_or_initialize_by(id: values['id']) values.each{|key, value| if model.has_attribute?(key) model[key] = value end } model.order_number = model.id + model.sort model.save! } end desc "load seed file" task load: :environment do if ENV.has_key?('FILE') tsv_names = ENV['FILE'].split(',') tsv_names.each{|xsv_name| load_xsv_aspect(xsv_name) } end end end |
For the file "dir/file.csv"
, Dir::File
class is used as the model.
And then, call it in seed.rb
.
seed.rb
1 2 3 4 5 6 7 8 |
files = [ 'japan/prefecture.csv', 'china/company.csv', 'general/gender.csv', 'feedback/feedback_status.csv' ] ENV['FILE'] = files.join(',') Rake::Task['seed_file:load'].invoke |
Please read Rails 4 : Load Seed from TSV for more information.