Rails 4 : Load Seed from TSV

Table of Contents

Some days ago, I was requested to enable to load seed data from excel, by my client, in rails 4 project. I told that data to be edited should be managed in database, not excel. But he really requested excel, then I and he decided load data from tsv. Excel is functional, so TSV is appropriate.

Select data area and copy and paste with mouse, Excel data can be pasted as TSV data to text editor, and vise versa.

Environment

Ubuntu 14.04 LTS
Rails 4.1.8
Ruby 2.2.2

Direction

Create feature to load TSV data as a task, and enable to execute with rake.
On executing rake db:seed, execute the task to load TSV.

Usage

Instruction

rake seed_file:load TSV=aaa,bbb,ccc/ddd で aaa.tsv、 bbb.tsv、 ccc/ddd.tsv をこの順で取り込みます。特別処理をするコードを書いていない場合はクラス Aaa、Bbb、Ccc::Ddd が必要になります。

rake seed_file:load を実行しても db/seeds/tsv の下のすべての TSV を取り込むようにはなっていません。進行中だったプロジェクトにおいて必要性を感じなかったので実装しませんでした。

Attention

TSV は db/seeds/tsv ディレクトリの下にモデル名のスネークケース + “.tsv” の形で配置されるものとします。
TSV は 1行目をカラム名、2行目以降を値にします。存在しないカラムが記述されている場合、そのカラムはデータベースに取り込まれません。 _memo という存在しない名前のカラムを作って、備考として利用することも可能です。
一度取り込んだ id は変更しないこと。 id をもとにデータの新規作成・更新を行っているので、 id が変更されると意図せぬデータができます。
取り込んだデータの削除はできません。

Program

The following program creates TSV object first, and read each line. It enables to load big file because it doesn’t load all TSV lines into RAM at once. In ruby, require 'csv' gives us CSV Class that can handle TSV, but I wanted hash of column name and value, and didn’t want to write TSV handling process in main procedure, so I created TSV Class.

Code

namespace :seed_file do
# class to handle TSV file
class TSV
TSV_ROOT = Rails.root.join('db', 'seeds', 'tsv')
# Initialize.
# first column of tsv file should be column name array
# ==== Parameters
# * +file_name+ - tsv file name except extension
def initialize(file_name)
@file_path = File.join(TSV_ROOT, file_name)
end
# Yield each line as hash.
# ==== Example
# instance.each{|values|
#   value = values['column_name']
# }
# ==== Note
# This method is usefull when handle too big tsv file,
# because this method doesn't load all tsv contents into memory,
# handle each line.
def each(&block)
File.open(@file_path) do |tsv|
columns = self.class.split_line(tsv.readline)
while !tsv.eof
values = self.class.split_line(tsv.readline)
kv = {}
columns.each_with_index{|column_name, index|
kv[column_name] = values[index]
}
yield kv
end
end
end
private
# Split tab separated value string into array
# ==== Parameter
# * +line+ - tab separated value string
def self.split_line(line)
return line.chomp.split("t")
end
end
# Load tsv file without transaction.
# ==== Parameter
# * +file_key_name+ - file key name, it can be category_table or table name.
def load_tsv(file_key_name)
tsv = TSV.new(file_key_name + '.tsv')
case file_key_name
when 'category_table'
load_category_table_tsv(tsv)
else
load_table_tsv(tsv, file_key_name)
end
end
# Load tsv file with transaction
# ==== Parameter
# * +file_key_name+
def load_tsv_aspect(file_key_name)
ActiveRecord::Base.transaction do
load_tsv(file_key_name)
end
end
def load_category_table_tsv(tsv)
category_type = nil
tsv.each{|values|
if category_type.nil? || category_type.id != values['type_id'].to_i
# Update CategoryType
category_type = CategoryType.find_or_initialize_by(id: values['type_id'])
category_type.attributes = {
name: values['type_name'],
}
category_type.save!
end
# Update Category
category = Category.find_or_initialize_by(id: values['id'])
category.attributes = {
name:    values['name'],
type_id: category_type.id,
}
category.save!
1.upto(3) {|part|
CategoryPart = CategoryPart.find_or_initialize_by(
id: values['part_' << part.to_s << '_id'])
value = values['part_' << part.to_s << '_value']
category_part.attributes = {
part_number: part,
category_id: category.id,
detail:      values['part_' << part.to_s],
}
category_part.save!
}
}
end
# load tsv to table
# ==== Parameter
# * +tsv+ - tsv object
# * +file_key_name+
def load_table_tsv(tsv, file_key_name)
model_class = file_key_name.classify.constantize
tsv.each{|values|
model = model_class.find_or_initialize_by(id: values['id'])
values.each{|key, value|
if model.has_attribute?(key)
model[key] = value
end
}
model.save!
}
end
desc "load seed file"
task load: :environment do
if ENV.has_key?('TSV')
tsv_names = ENV['TSV'].split(',')
tsv_names.each{|tsv_name|
load_tsv_aspect(tsv_name)
}
end
end
end

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

namespace :seed_file do

# class to handle TSV file

class TSV

TSV_ROOT = Rails.root.join('db', 'seeds', 'tsv')

# Initialize.

# first column of tsv file should be column name array

# ==== Parameters

# * +file_name+ - tsv file name except extension

def initialize(file_name)

@file_path = File.join(TSV_ROOT, file_name)

end

# Yield each line as hash.

# ==== Example

# instance.each{|values|

# value = values['column_name']

# }

# ==== Note

# This method is usefull when handle too big tsv file,

# because this method doesn't load all tsv contents into memory,

# handle each line.

def each(&block)

File.open(@file_path) do |tsv|

columns = self.class.split_line(tsv.readline)

while !tsv.eof

values = self.class.split_line(tsv.readline)

kv = {}

columns.each_with_index{|column_name, index|

kv[column_name] = values[index]

}

yield kv

end

private

# Split tab separated value string into array

# ==== Parameter

# * +line+ - tab separated value string

def self.split_line(line)

return line.chomp.split("t")

end

# Load tsv file without transaction.

# ==== Parameter

# * +file_key_name+ - file key name, it can be category_table or table name.

def load_tsv(file_key_name)

tsv = TSV.new(file_key_name + '.tsv')

case file_key_name

when 'category_table'

load_category_table_tsv(tsv)

else

load_table_tsv(tsv, file_key_name)

end

# Load tsv file with transaction

# ==== Parameter

# * +file_key_name+

def load_tsv_aspect(file_key_name)

ActiveRecord::Base.transaction do

load_tsv(file_key_name)

end

def load_category_table_tsv(tsv)

category_type = nil

tsv.each{|values|

if category_type.nil? || category_type.id != values['type_id'].to_i

# Update CategoryType

category_type = CategoryType.find_or_initialize_by(id: values['type_id'])

category_type.attributes = {

name: values['type_name'],

}

category_type.save!

end

# Update Category

category = Category.find_or_initialize_by(id: values['id'])

category.attributes = {

name: values['name'],

type_id: category_type.id,

}

category.save!

1.upto(3) {|part|

CategoryPart = CategoryPart.find_or_initialize_by(

id: values['part_' << part.to_s << '_id'])

value = values['part_' << part.to_s << '_value']

category_part.attributes = {

part_number: part,

category_id: category.id,

detail: values['part_' << part.to_s],

}

category_part.save!

}

end

# load tsv to table

# ==== Parameter

# * +tsv+ - tsv object

# * +file_key_name+

def load_table_tsv(tsv, file_key_name)

model_class = file_key_name.classify.constantize

tsv.each{|values|

model = model_class.find_or_initialize_by(id: values['id'])

values.each{|key, value|

if model.has_attribute?(key)

model[key] = value

end

}

model.save!

}

end

desc "load seed file"

task load: :environment do

if ENV.has_key?('TSV')

tsv_names = ENV['TSV'].split(',')

tsv_names.each{|tsv_name|

load_tsv_aspect(tsv_name)

}

end

Explanation

On executing rake seed_file:load, start from the line includes task load:. :environment is required to use model classes. そこでは環境変数 TSV に渡された文字列をカンマで区切り、 load_tsv_aspect を実行してデータベースにインポートします。

load_tsv_aspect surrounds load_tsv with transaction.

load_tsv では環境変数 TSV に渡されていた名前に応じて処理を分けます。基本的には 1つのTSVを1つのテーブルにロードするだけです。しかし、特別に表形式のほうがデータが管理しやすい場合で、そのほうがミスが少ない場合は 1つのTSVに3つのテーブルのデータを保存して管理します。そういうときのために、特殊なテーブルには専用の取り込みメソッドを使用します。上のコードでは category_table というのが環境変数 TSV に渡された場合に、 category_table.tsv から 3つのテーブルにデータをインポートします。

一般的な場合の load_table_tsv では、カラムと値をチェックして、指定されたカラムがテーブルにあればデータとして扱います。 TSVに記述されたカラムがなければ処理を行わないため、 _memo といったカラムを作って、管理のための備考を追加することもできます。

Now, the task was created. Then, edit seeds.rb.

seeds.rb

To import TSV when rake db:seed is executed, add some codes to seeds.rb. It’s the following 2 lines.

ENV['TSV'] = 'category_table,prefecture'
Rake::Task['seed_file:load'].invoke

1 2	ENV['TSV'] = 'category_table,prefecture' Rake::Task['seed_file:load'].invoke

ENV['TSV'] で取り込み対象の TSV を指定します。そして Rake::Task['seed_file:load'].invoke でタスクを実行して TSV を取り込みます。

The Life

Rails 4 : Load Seed from TSV

Environment

Direction

Usage

Instruction

Attention

Program

Code

Explanation

seeds.rb

Related Posts

Readers who viewed this page, also viewed:

A Life Summary of an Gypsy