Skip to main content

Command Palette

Search for a command to run...

Building a High-Performance Company Name Standardizer with Rust, React, and the SEC API

Stop hardcoding mappings: A dynamic approach to financial data standardization.

Updated
11 min read
Building a High-Performance Company Name Standardizer with Rust, React, and the SEC API

Data Engineers know the pain: you receive a dataset where one row says "JPMC", another says "J.P. Morgan", and a third says "JPMorgan Chase & Co.". To perform any meaningful aggregation, you need to normalize these into a single "Golden Record."

While Python (scikit-learn / fuzzywuzzy) is the go-to for this, it can be slow at scale and requires heavy dependencies.

In this post, I’ll show you how to build a blazing fast, production-grade Standardization Engine using Rust for the backend and React for the UI. We will go beyond simple regex and implement:

  1. Real-time Master Data: Fetching live tickers from the SEC (US Securities and Exchange Commission).

  2. Fuzzy Logic: Using the Jaro-Winkler algorithm to handle typos ("Amzn" → "Amazon").

  3. Modern UI: A polished React frontend with Dark Mode and CSV Export.


🏗️ The Architecture

  • Backend: Rust (Actix-web)

    • Speed: Handles thousands of requests per second.

    • Logic: strsim crate for string distance calculations.

    • Data: Fetches company_tickers.json from SEC.gov on startup.

  • Frontend: React (Vite)

    • UX: Real-time lookup, Batch CSV processing, Dark/Light mode toggle.

🚀 Part 1: The Rust Backend

We use Actix-web for the server and strsim for the fuzzy matching logic.

1. Initialize the Project

mkdir company-standardizer
cd company-standardizer
cargo new backend
cd backend

2. Dependencies (Cargo.toml)

We need reqwest to fetch the SEC data and strsim for the math.

[package]
name = "backend"
version = "0.1.0"
edition = "2021"

[dependencies]
actix-web = "4"
actix-cors = "0.6"
serde = { version = "1.0", features = ["derive"] }
serde_json = "1.0"
strsim = "0.11" # The magic fuzzy matching crate
reqwest = { version = "0.11", features = ["json", "blocking"] }
tokio = { version = "1", features = ["full"] }

3. The Logic (src/main.rs)

This is the core engine. It employs a "Waterfall" matching strategy:

  1. Ticker Map: O(1) Lookup (e.g., "MSFT" -> "Microsoft").

  2. Jaro-Winkler: Measures string similarity. If the score > 0.75, we consider it a match.

use actix_cors::Cors;
use actix_web::{web, App, HttpResponse, HttpServer, Responder};
use serde::{Deserialize, Serialize};
use std::collections::HashMap;
use std::sync::Mutex;
use strsim::jaro_winkler;

// --- Data Structures ---
#[derive(Clone, Serialize, Deserialize, Debug)]
struct CompanyReference {
    title: String,
    ticker: String,
}

#[derive(Deserialize, Debug)]
struct SecCompany {
    ticker: String,
    title: String,
}

#[derive(Serialize, Debug)]
struct MatchResult {
    input: String,
    standardized_name: Option<String>,
    ticker: Option<String>,
    score: f64,
    method: String,
}

#[derive(Deserialize)]
struct LookupRequest { name: String }

#[derive(Deserialize)]
struct BatchRequest { names: Vec<String> }

// --- The Engine ---
struct StandardizationEngine {
    master_list: Vec<CompanyReference>,
    ticker_map: HashMap<String, String>,
}

impl StandardizationEngine {
    fn new(companies: Vec<CompanyReference>) -> Self {
        let mut ticker_map = HashMap::new();
        for company in &companies {
            ticker_map.insert(company.ticker.clone(), company.title.clone());
        }
        StandardizationEngine { master_list: companies, ticker_map }
    }

    fn search(&self, query: &str) -> MatchResult {
        let clean_query = query.trim();
        let query_upper = clean_query.to_uppercase();

        // 1. Ticker Lookup (Exact & Fast)
        if let Some(name) = self.ticker_map.get(&query_upper) {
            return MatchResult {
                input: query.to_string(),
                standardized_name: Some(name.clone()),
                ticker: Some(query_upper),
                score: 1.0,
                method: "Ticker Map".to_string(),
            };
        }

        // 2. Fuzzy Search (Jaro-Winkler)
        let mut best_match: Option<CompanyReference> = None;
        let mut best_score = 0.0;

        for company in &self.master_list {
            let score = jaro_winkler(&query_upper, &company.title.to_uppercase());
            if score > best_score {
                best_score = score;
                best_match = Some(company.clone());
            }
        }

        let method = if best_score == 1.0 { "Exact Match" } else { "Jaro-Winkler Fuzzy" };

        // Threshold > 0.75 usually indicates a good match for company names
        if best_score > 0.75 {
            let m = best_match.unwrap();
            MatchResult {
                input: query.to_string(),
                standardized_name: Some(m.title),
                ticker: Some(m.ticker),
                score: best_score,
                method: method.to_string(),
            }
        } else {
            MatchResult {
                input: query.to_string(),
                standardized_name: None,
                ticker: None,
                score: best_score,
                method: "No Match".to_string(),
            }
        }
    }
}

struct AppState {
    engine: Mutex<StandardizationEngine>,
}

// --- API Handlers ---
async fn standardize_one(data: web::Data<AppState>, req: web::Json<LookupRequest>) -> impl Responder {
    let engine = data.engine.lock().unwrap();
    let result = engine.search(&req.name);
    HttpResponse::Ok().json(result)
}

async fn standardize_batch(data: web::Data<AppState>, req: web::Json<BatchRequest>) -> impl Responder {
    let engine = data.engine.lock().unwrap();
    let results: Vec<MatchResult> = req.names.iter().map(|name| engine.search(name)).collect();
    HttpResponse::Ok().json(results)
}

// --- Data Loader ---
async fn fetch_sec_data() -> Vec<CompanyReference> {
    println!("Fetching live data from SEC.gov...");
    let client = reqwest::Client::new();
    // SEC requires a user-agent
    let resp = client.get("https://www.sec.gov/files/company_tickers.json")
        .header("User-Agent", "TestApp (test@example.com)") 
        .send().await;

    if let Ok(response) = resp {
        if let Ok(map) = response.json::<HashMap<String, SecCompany>>().await {
            let mut companies = Vec::new();
            for (_, val) in map {
                companies.push(CompanyReference { title: val.title, ticker: val.ticker });
            }
            println!("Loaded {} companies.", companies.len());
            return companies;
        }
    }
    // Fallback data if API fails
    vec![CompanyReference { title: "Apple Inc.".into(), ticker: "AAPL".into() }]
}

#[actix_web::main]
async fn main() -> std::io::Result<()> {
    let companies = fetch_sec_data().await;
    let engine = StandardizationEngine::new(companies);
    let app_state = web::Data::new(AppState { engine: Mutex::new(engine) });

    println!("Server running at http://127.0.0.1:8080");

    HttpServer::new(move || {
        App::new()
            .wrap(Cors::permissive())
            .app_data(app_state.clone())
            .route("/api/standardize", web::post().to(standardize_one))
            .route("/api/batch-standardize", web::post().to(standardize_batch))
    })
    .bind(("127.0.0.1", 8080))?
    .run()
    .await
}

🎨 Part 2: The React Frontend

We use Vite for a fast setup. The frontend features a beautiful card-based layout, a Dark Mode toggle, and CSV export functionality.

1. Setup

cd .. # go to the root directory
npm create vite@latest frontend -- --template react
cd frontend
npm install axios

2. Styling (App.css)

This CSS uses variables for instant theme switching.

:root {
  /* Light Theme Variables */
  --bg-color: #f8f9fa;
  --card-bg: #ffffff;
  --text-main: #2c3e50;
  --text-secondary: #5f6368;
  --accent-color: #2563eb;
  --accent-hover: #1d4ed8;
  --border-color: #e2e8f0;
  --input-bg: #ffffff;
  --table-header: #f1f5f9;
  --table-hover: #f8fafc;
  --shadow: 0 4px 6px -1px rgba(0, 0, 0, 0.1), 0 2px 4px -1px rgba(0, 0, 0, 0.06);
}

[data-theme='dark'] {
  /* Dark Theme Variables */
  --bg-color: #0f172a;
  --card-bg: #1e293b;
  --text-main: #f1f5f9;
  --text-secondary: #94a3b8;
  --accent-color: #3b82f6;
  --accent-hover: #60a5fa;
  --border-color: #334155;
  --input-bg: #0f172a;
  --table-header: #334155;
  --table-hover: #1e293b; /* Slightly lighter than card bg for hover effect if needed */
  --shadow: 0 4px 6px -1px rgba(0, 0, 0, 0.5);
}

body {
  margin: 0;
  background-color: var(--bg-color);
  color: var(--text-main);
  font-family: 'Inter', system-ui, -apple-system, sans-serif;
  transition: background-color 0.3s, color 0.3s;
  min-height: 100vh;
  display: flex;
  justify-content: center;
}

#root {
  width: 100%;
  max-width: 1000px;
  padding: 2rem;
  text-align: center;
}

/* Header & Toggle */
header {
  display: flex;
  justify-content: space-between;
  align-items: center;
  margin-bottom: 2rem;
  padding-bottom: 1rem;
  border-bottom: 1px solid var(--border-color);
}

h1 {
  font-size: 2rem;
  margin: 0;
  background: linear-gradient(90deg, var(--accent-color), #8b5cf6);
  -webkit-background-clip: text;
  -webkit-text-fill-color: transparent;
}

.theme-toggle {
  background: none;
  border: 1px solid var(--border-color);
  color: var(--text-main);
  font-size: 1.2rem;
  padding: 0.5rem;
  border-radius: 50%;
  cursor: pointer;
  transition: all 0.2s;
  width: 40px;
  height: 40px;
  display: flex;
  align-items: center;
  justify-content: center;
}
.theme-toggle:hover {
  background-color: var(--border-color);
}

/* Cards */
.card {
  background-color: var(--card-bg);
  padding: 2rem;
  border-radius: 12px;
  box-shadow: var(--shadow);
  border: 1px solid var(--border-color);
  margin-bottom: 2rem;
  transition: background-color 0.3s;
}

h2 {
  margin-top: 0;
  color: var(--text-main);
  font-size: 1.25rem;
  margin-bottom: 1.5rem;
  text-align: left;
}

/* Inputs & Buttons */
.search-box {
  display: flex;
  gap: 12px;
}

input[type="text"] {
  flex: 1;
  padding: 12px 16px;
  border-radius: 8px;
  border: 1px solid var(--border-color);
  background-color: var(--input-bg);
  color: var(--text-main);
  font-size: 1rem;
  outline: none;
  transition: border-color 0.2s;
}

input[type="text"]:focus {
  border-color: var(--accent-color);
  box-shadow: 0 0 0 2px rgba(37, 99, 235, 0.2);
}

button.primary-btn {
  background-color: var(--accent-color);
  color: white;
  border: none;
  padding: 12px 24px;
  border-radius: 8px;
  font-weight: 600;
  cursor: pointer;
  transition: background-color 0.2s;
}

button.primary-btn:hover {
  background-color: var(--accent-hover);
}

button:disabled {
  opacity: 0.7;
  cursor: not-allowed;
}

/* File Upload */
.file-upload-label {
  display: inline-block;
  padding: 12px 24px;
  border: 2px dashed var(--border-color);
  border-radius: 8px;
  cursor: pointer;
  color: var(--text-secondary);
  font-weight: 500;
  width: 100%;
  box-sizing: border-box;
  transition: all 0.2s;
}

.file-upload-label:hover {
  border-color: var(--accent-color);
  color: var(--accent-color);
  background-color: rgba(37, 99, 235, 0.05);
}

/* Results Grid (Single) */
.result-grid {
  display: grid;
  grid-template-columns: repeat(auto-fit, minmax(200px, 1fr));
  gap: 1.5rem;
  margin-top: 1.5rem;
  text-align: left;
  padding-top: 1.5rem;
  border-top: 1px solid var(--border-color);
}

.result-item label {
  display: block;
  font-size: 0.85rem;
  color: var(--text-secondary);
  margin-bottom: 0.25rem;
  text-transform: uppercase;
  letter-spacing: 0.05em;
}

.result-item .value {
  font-size: 1.1rem;
  font-weight: 600;
  color: var(--text-main);
}

.ticker-badge {
  background-color: var(--border-color);
  color: var(--text-main);
  padding: 2px 6px;
  border-radius: 4px;
  font-size: 0.8rem;
  margin-left: 8px;
  vertical-align: middle;
}

/* Table */
.table-container {
  overflow-x: auto;
  border-radius: 8px;
  border: 1px solid var(--border-color);
}

table {
  width: 100%;
  border-collapse: collapse;
}

th {
  background-color: var(--table-header);
  color: var(--text-secondary);
  font-weight: 600;
  padding: 12px 16px;
  text-align: left;
  font-size: 0.9rem;
}

td {
  padding: 12px 16px;
  border-top: 1px solid var(--border-color);
  color: var(--text-main);
  text-align: left;
}

tr:hover td {
  background-color: var(--table-hover); /* Fixed hover visibility */
}

/* Utilities */
.status-badge {
  padding: 4px 8px;
  border-radius: 4px;
  font-size: 0.85rem;
  font-weight: 600;
}

3. The Logic (App.jsx)

Features:

  • Batch Processing: Upload a .txt file, get a table of results.

  • CSV Download: Automatically generates a report of the standardization.

  • Theme Toggle: Persists your choice in localStorage.

import { useState, useEffect } from 'react'
import axios from 'axios'
import './App.css'

function App() {
    const [inputText, setInputText] = useState('')
    const [result, setResult] = useState(null)
    const [batchResults, setBatchResults] = useState([])
    const [loading, setLoading] = useState(false)
    const [theme, setTheme] = useState('light')

    // Theme Init
    useEffect(() => {
        const savedTheme = localStorage.getItem('theme')
        const prefersDark = window.matchMedia('(prefers-color-scheme: dark)').matches

        if (savedTheme) {
            setTheme(savedTheme)
            document.documentElement.setAttribute('data-theme', savedTheme)
        } else if (prefersDark) {
            setTheme('dark')
            document.documentElement.setAttribute('data-theme', 'dark')
        }
    }, [])

    const toggleTheme = () => {
        const newTheme = theme === 'light' ? 'dark' : 'light'
        setTheme(newTheme)
        document.documentElement.setAttribute('data-theme', newTheme)
        localStorage.setItem('theme', newTheme)
    }

    // --- API Handlers ---

    const handleCheck = async () => {
        if (!inputText) return
        setLoading(true)
        try {
            const res = await axios.post('http://127.0.0.1:8080/api/standardize', { name: inputText })
            setResult(res.data)
        } catch (err) {
            console.error(err)
            alert("Error connecting to backend")
        }
        setLoading(false)
    }

    const handleFileUpload = (e) => {
        const file = e.target.files[0]
        if (!file) return

        const reader = new FileReader()
        reader.onload = async (event) => {
            const text = event.target.result
            const names = text.split('\n').map(n => n.trim()).filter(n => n)

            setLoading(true)
            try {
                const res = await axios.post('http://127.0.0.1:8080/api/batch-standardize', { names })
                setBatchResults(res.data)
            } catch (err) {
                console.error(err)
                alert("Error processing batch")
            }
            setLoading(false)
        }
        reader.readAsText(file)
    }

    // --- CSV Download Logic ---

    const downloadCSV = () => {
        if (batchResults.length === 0) return

        // 1. Define Headers
        const headers = ["Input Name", "Standardized Name", "Ticker", "Method", "Confidence Score"]

        // 2. Format Data Rows
        const csvRows = batchResults.map(item => {
            // Escape quotes in data to prevent CSV breakage
            const safeInput = `"${item.input.replace(/"/g, '""')}"`
            const safeName = `"${(item.standardized_name || "").replace(/"/g, '""')}"`
            const safeTicker = `"${(item.ticker || "").replace(/"/g, '""')}"`
            const safeMethod = `"${item.method}"`
            const safeScore = `"${(item.score * 100).toFixed(1)}%"`

            return [safeInput, safeName, safeTicker, safeMethod, safeScore].join(",")
        })

        // 3. Combine and Trigger Download
        const csvContent = [headers.join(","), ...csvRows].join("\n")
        const blob = new Blob([csvContent], { type: 'text/csv;charset=utf-8;' })
        const url = URL.createObjectURL(blob)

        const link = document.createElement("a")
        link.href = url
        link.setAttribute("download", "standardized_companies_report.csv")
        document.body.appendChild(link)
        link.click()
        document.body.removeChild(link)
    }

    // --- Render Helpers ---

    const getScoreColor = (score) => {
        if (score === 1.0) return '#22c55e'
        if (score > 0.85) return '#3b82f6'
        if (score > 0.75) return '#f59e0b'
        return '#ef4444'
    }

    return (
        <div className="app-container">
            <header>
                <div>
                    <h1>Company Standardizer</h1>
                    <span style={{ color: 'var(--text-secondary)', fontSize: '0.9rem' }}>
            Powered by Rust & SEC API
          </span>
                </div>
                <button className="theme-toggle" onClick={toggleTheme} title="Toggle Theme">
                    {theme === 'light' ? '🌙' : '☀️'}
                </button>
            </header>

            {/* Single Lookup Card */}
            <section className="card">
                <h2>Single Entity Lookup</h2>
                <div className="search-box">
                    <input
                        type="text"
                        value={inputText}
                        onChange={(e) => setInputText(e.target.value)}
                        placeholder="Enter Company Name or Ticker (e.g., Apple, NVDA)"
                        onKeyDown={(e) => e.key === 'Enter' && handleCheck()}
                    />
                    <button className="primary-btn" onClick={handleCheck} disabled={loading}>
                        {loading ? 'Searching...' : 'Search'}
                    </button>
                </div>

                {result && (
                    <div className="result-grid">
                        <div className="result-item">
                            <label>Input</label>
                            <div className="value">{result.input}</div>
                        </div>

                        <div className="result-item">
                            <label>Standardized Name</label>
                            <div className="value" style={{ display: 'flex', flexDirection: 'column', alignItems: 'flex-start', gap: '5px' }}>
                                {result.standardized_name ? (
                                    <>
                                        <span>{result.standardized_name}</span>
                                        {result.ticker && (
                                            <span className="ticker-badge" title="Stock Ticker">
                        {result.ticker}
                      </span>
                                        )}
                                    </>
                                ) : (
                                    <span style={{color: 'var(--text-secondary)'}}>No Match</span>
                                )}
                            </div>
                        </div>

                        <div className="result-item">
                            <label>Method</label>
                            <div className="value" style={{ fontSize: '0.95rem' }}>{result.method}</div>
                        </div>

                        <div className="result-item">
                            <label>Confidence</label>
                            <div className="value" style={{ color: getScoreColor(result.score) }}>
                                {(result.score * 100).toFixed(1)}%
                            </div>
                        </div>
                    </div>
                )}
            </section>

            {/* Batch Processing Card */}
            <section className="card">
                <div style={{ display: 'flex', justifyContent: 'space-between', alignItems: 'center', marginBottom: '1.5rem' }}>
                    <h2 style={{ marginBottom: 0 }}>Batch Processing</h2>

                    {/* Download Button - Only shows when there are results */}
                    {batchResults.length > 0 && (
                        <button
                            onClick={downloadCSV}
                            style={{
                                backgroundColor: '#22c55e', // Green for Excel/CSV
                                color: 'white',
                                border: 'none',
                                padding: '8px 16px',
                                borderRadius: '6px',
                                cursor: 'pointer',
                                fontSize: '0.9rem',
                                fontWeight: '600',
                                boxShadow: '0 2px 4px rgba(0,0,0,0.1)'
                            }}
                        >
                            ⬇ Download CSV
                        </button>
                    )}
                </div>

                <div style={{ marginBottom: '1.5rem' }}>
                    <label className="file-upload-label">
                        <input
                            type="file"
                            onChange={handleFileUpload}
                            accept=".txt,.csv"
                            style={{ display: 'none' }}
                        />
                        {loading ? "Processing..." : "Click to Upload .txt or .csv List"}
                    </label>
                </div>

                {loading && batchResults.length === 0 && <p style={{color: 'var(--text-secondary)'}}>Processing batch file...</p>}

                {batchResults.length > 0 && (
                    <div className="table-container">
                        <table>
                            <thead>
                            <tr>
                                <th>Input</th>
                                <th>Standardized Name</th>
                                <th>Ticker</th>
                                <th>Method</th>
                                <th>Score</th>
                            </tr>
                            </thead>
                            <tbody>
                            {batchResults.map((item, idx) => (
                                <tr key={idx}>
                                    <td>{item.input}</td>
                                    <td>{item.standardized_name || "-"}</td>
                                    <td>
                                        {item.ticker ? <span className="ticker-badge">{item.ticker}</span> : <span style={{color:'var(--text-secondary)'}}>-</span>}
                                    </td>
                                    <td>{item.method}</td>
                                    <td>
                      <span
                          className="status-badge"
                          style={{
                              backgroundColor: `${getScoreColor(item.score)}20`,
                              color: getScoreColor(item.score)
                          }}
                      >
                        {(item.score * 100).toFixed(0)}%
                      </span>
                                    </td>
                                </tr>
                            ))}
                            </tbody>
                        </table>
                    </div>
                )}
            </section>
        </div>
    )
}

export default App

⚡ How to Run

Follow these exact steps to run the application locally.

1. Start the Backend

Open a terminal in the backend/ directory:

# This compiles the Rust code and starts the server
cargo run

Wait until you see: Successfully loaded X companies... Server running at http://127.0.0.1:8080.

2. Start the Frontend

Open a new terminal window in the frontend/ directory:

# Install dependencies (only needed the first time)
npm install

# Start the dev server
npm run dev

Click the URL displayed (usually http://localhost:5173).

3. Test It Out

  • Exact Match: Type MSFT. The system detects the ticker and returns Microsoft Corporation.

  • Fuzzy Match: Type Amzn. The system uses Jaro-Winkler logic to match it to Amazon.com Inc. with high confidence.

  • Batch: Create a text file named test.txt with the content:

      Appl
      Teslaa
      JPMvc
    

    Upload it, view the table, and click Download CSV to get your report.


Conclusion

By combining the performance of Rust with the interactivity of React, we've built a tool that solves a real Data Engineering problem. This architecture is easily extensible, you could swap the SEC API for an internal SQL database or add machine learning models for even smarter matching.